Updated on October 15, 2024
To date, the AI race has had two major components. On the one hand, you have models that boast of increased sizes and outsized training data like GPT 4, Gemini 1.5, and Claude Opus, and on the other hand, you have small language models built for speed and accuracy in small sizes.
So, why do accomplished companies in the space who are trying to reach the largest parameter size also invest time to build smaller language models?
Well, it’s because, as Darren Oberst, the CEO of LLM Ware, puts it, “My experience is that a small model can probably do 80% to 90% of what the ‘mega model’ can do … but you’re going to be able to do it at probably 1/100th the cost.”
Practically speaking, SLMs are tailor-made for business use cases. They’re relatively inexpensive, maintain similar levels of functionality, and can be great for customer support processes.
In this article, we’ll review the capabilities of these small language models and explore the following themes:
1. Parameter Size – Who’s Winning?
2. The Benefits of Small Language Models
3. Measuring the Performance of Small Language Models
5. Conclusion
Parameter Sizes – Who’s Winning?
It’s often confusing when you reference the size of an LLM. For example, ChatGPT 4 was trained on well over 100 trillion parameters.
This, however, doesn’t refer to any unit of physical memory. Instead, it refers to the number of linguistic connections that ChatGPT has drawn for a word during training.
Let’s make it more intuitive by talking about how a LLM works:
1. The LLM encodes many unlabeled data during pre-training into a vector storage.
2. While encoding, LLMs learn about the relationships between these words and form connections between them.
3. These connections enable LLMs to understand and reply in natural language. And each connection it makes is counted as a parameter.
Given that LLMs like Gemini 1.5, Claude Opus, and GPT 4-o have been trained in massive datasets (some speculate that they have the entirety of the internet’s textual data), the number of connections they can draw is enormous. So, the parameter sizes increase.
However, most use cases don’t require a sophisticated understanding of LLMs. Since most business use cases involve complete fine-tuning or RAG processes, a larger model creates a computational bottleneck.
So, while SLMs might not be used to achieve AGI, they fit perfectly for most business and consumer use cases. Let’s see how by talking about their benefits.
Benefits of Small Language Models
Small Language Models are the faster counterparts of LLMs. They use the same architecture but are trained in limited (often very targeted) knowledge domains. This provides them with some distinct advantages over LLMs.
1. Speed and Inference Time Improvements – Considering parameters as a model’s wide space to search for answers is useful.
This increases the complexity of the process and takes more time for an answer (think of how o1 might take minutes to answer a question when GPT 4-o Mini can answer in microseconds).
This speed improvement also mimics human conversation speed and is helpful for use cases like customer support, where natural conversations are essential.
2. Domain-Specific Knowledge – While an encyclopedia has a broader range of knowledge available, it’s not exactly suitable for learning more about specific topics or domains.
You have the same problem with LLMs. While they’re trained on a vast amount of data, they lack domain-specific knowledge that they can use to answer specific questions. SLMs, in contrast, can be trained to be domain-specific, and some specialized models like FinGPT are built to provide domain-specific knowledge.
3. Privacy-First – LLMs are computationally expensive, and unless you have the specific data infrastructure to run them, the data needs to be outsourced to a cloud provider. Additionally, there have been some initial concerns about how more prominent companies might use user-provided data for training purposes.
SLMs can be run on much smaller infrastructures and on-site. This makes it extremely useful for sensitive cases where data protection must be prioritized.
4. Accuracy – While there was a prevailing theory about LLMs being more accurate than SLMs, recent research has questioned it. According to the latest research, LLMs become more unreliable with increasing size and have struggle to answer more straightforward questions.
Now, consider that most customer support questions are indeed simple. And you can see why companies like Kommunicate prefer smaller models like Claude Haiku over Claude Opus or Sonnet.
These are the four key reasons for considering SLMs above LLMs regarding tasks. This can be further demonstrated by examining the performance of these SLMs versus that of their LLM counterparts.
Measuring the Performance of SLMs
In a recent research paper, scientists have quantified the way SLMs have improved over the past few years:
1. SLMs improved by 13.5% from 2022 to 2024, while LLM performance improved by 7.5%.
2. Latest SLMs like Phi-3 from Microsoft and Qwen 2.5 outperform the open-source Llama 3.1 7 B model, showing best-in-class performance for their size.
3. In small models, the parameter size usually determines the accuracy of the chatbot. However, small models like Qwen 2 with 1.5 B parameters showcase excellent capability in specific tasks.
4. These SLMs use far less memory during runtime, showcasing their capability to run on daily-use devices.
5. SLMs trained on high-quality datasets can outperform LLMs in specific contexts.
These insights showcase how the SLMs have improved and can be very useful for businesses that want to use them for customer service operations. Indeed, we at Kommunicate have incorporated three state-of-the-art (SOTA) SLMs, which you can try out now.
Three SLMs For a Customer Service Chatbot
At Kommunicate, you can connect with various AI models from Google, Amazon, IBM, and other major companies. We’re fully AI agnostic.
However, as our mainstay for generative AI chatbots, we use three primary SLMs, and you can try them out right now:
1. Gemini Flash – Google’s flagship model has the latest NLP methods and an improved transformer architecture. These models are remarkably fast and can provide human-like answers to your questions.
2. Claude Haiku – Anthropic’s Haiku is their lightweight model that scores very well while giving contextual answers. This is advanced and provides very quick answers.
3. GPT 4-o Mini – AVailable for free on the ChatGPT website, the GPT 4-o Mini is a model equipped with advanced NLP and powered by Open AI’s research. It outshines previous SLMs while taking up much less memory during runtime.
You can access all three models for free when you navigate to the dashboard on the Kommunicate app and go to Integrations. We also offer integrations with Amazon Lex, Dialogflow, and other AI models if you’re interested in other AI models.
Conclusion
While larger models like Open AI o1 capture most of the attention in the latest news cycle, SLMs have become increasingly popular for business use cases. They’re the cost-efficient and smaller alternatives to LLMs that can be inexpensively trained to provide domain-specific answers.
So, if you’re looking to build a chatbot for your Fintech enterprise, you’ll get more computationally inexpensive and efficient answers through an SLM. Researchers have quantified these performance improvements and shown how SLMs can outperform LLMs when trained for domain-specific tasks.
At Kommunicate, we offer you access to three SOTA SLM models that you can use to create a chatbot for your business. You can see how they work by signing up for a 30-day free trial.
As a seasoned technologist, Adarsh brings over 14+ years of experience in software development, artificial intelligence, and machine learning to his role. His expertise in building scalable and robust tech solutions has been instrumental in the company’s growth and success.