Generative AI applications have largely centered around large language models (LLMs), which have dominated the landscape since the advent of ChatGPT. These models have attracted massive investment and sparked waves of innovation across various sectors. However, as the field evolves, it’s worth questioning whether the size of these models is truly as crucial as it seems for every application.
Alongside the rise of powerful LLMs from major players like OpenAI, Anthropic, and Google, there has been growing interest in smaller language models (SLMs). Unlike LLMs, SLMs are trained on more compact and specialized datasets, making them less resource-intensive and cheaper to produce. This reduced cost allows companies to build and train their own SLMs, tailoring them to specific needs or tasks that may not require the vast scale of an LLM.
One key advantage of SLMs is their ability to operate in more constrained environments. Because they require fewer computational resources, SLMs can function on edge devices or mobile platforms, offering flexibility where LLMs would typically be too demanding in terms of processing power. This opens up possibilities for more widespread adoption of generative AI in environments where computational resources are limited.
SLMs also introduce new possibilities for how developers design generative AI applications. With LLMs, the cost of retraining models meant they had a fixed knowledge cutoff, often requiring external mechanisms like retrieval-augmented generation (RAG) to supply up-to-date information. In contrast, SLMs are easier and more affordable to retrain, enabling more dynamic updates. By combining retraining with RAG, developers can create systems that continuously pull in the most relevant, real-time data, streamlining their AI’s ability to adapt and respond to user requests. This combination makes it possible to continuously improve the accuracy and relevance of generative AI responses while maintaining efficiency.