The Rise of Language Models

Language models have been the subject of research since as far back as the 1950s. The inception began with rule-based systems and statistical models, which were certainly innovative for their time but still lacking in performance, limited in their ability to truly "understand" natural human language [1].

A true turning point came with the emergence of large language models (LLMs). Groundbreaking models like BERT and GPT showcased the enormous potential of LLMs. With the versatility to understand and generate coherent text, LLMs can perform various tasks related to natural language with human-like intelligence.

However, the impressive capabilities of LLMs come with a significant trade-off: the need for immense computational resources. For example, the renowned GPT-3 model, with its 175 billion parameters, presents an impossible challenge for smaller organizations to train or maintain [2]. As a result, streamlined versions at a lower cost, known as small language models (SLMs), have been on the rise. But how do these smaller models stack up against their larger counterparts?

LLMs vs. SLMs: How Size Matters

As the names suggest, the main difference between LLMs and SLMs is size and complexity. Compared to LLMs, SLMs are designed to be more compact with fewer parameters or smaller structures and do not require as much data. While SLMs can be as small as a hundred million to under 30 billion parameters, LLMs can have hundreds of billions. Amazon’s recent model, Olympus, even boasts an astonishing 2 trillion parameters [3].

Feature

Large Language Models

Small Language Model

Parameter Size

Hundreds of billions to trillions of parameters

Tens of millions to under 30 billion parameters

Computational Resources

Requires significant computational resources

Requires fewer computational resources

Training Data

Trained on vast, diverse datasets across many domains

Trained on smaller, often domain-specific datasets

Training Time

Long training time

Shorter training time

Latency

More latency

Less latency

Inference Speed

Generally slower due to complexity

Faster due to smaller size

Versatility

High versatility; capable of handling a wide range of tasks

Typically specialized for specific tasks or domains

Cost

High development and deployment cost

Lower development and deployment cost

Use Cases

Complex, multi-step tasks, general-purpose applications

Targeted applications, domain-specific tasks

Real-World Examples

OpenAI’s GPT-3/4, Google’s BERT

Meta’s LLaMA 3, Mistral’s 7B

Deployment Feasibility

Suited for large organizations with ample resources

More accessible for smaller organizations

Choosing the Right Fit

As can be seen, choosing between LLMs and SLMs is mostly a matter of trade-off between capabilities and cost-effectiveness.

With sufficient time and resources, a complex, generalized model with comprehensive understanding and reasoning for processing complicated queries can be developed using the LLMs approach. The most prominent example of a real-world LLM is shown in OpenAI’s ChatGPT, a chatbot with the ability to resolve a wide range of advanced natural language tasks. Beyond the average conversation, ChatGPT is capable of language translation, text summarization, content creation, proofreading, text-based file processing, code generation, and much more. With the power to store and “remember” previous queries in a single session, the user can further refine prompts to pinpoint the model’s response with multi-step reasoning. With such impressive capabilities, it comes as no surprise that LLMs are popular with everyday users. As of July 2024, ChatGPT has over 180 million users [4], while Google’s BERT garners over 100,000 downloads per week [5].

However, not every organization has the luxury of creating models as complex as LLMs. In cases where resources are finite or the objective has been clearly identified, SLMs might prove to be a better option. At a lower price for both development and deployment, SLMs can provide solutions at a faster speed, tailored to a specific domain or problem. The GPT-4 model, which claims to have more than 1.7 trillion parameters, is capable of generating 28 tokens per second, while the newer everyday-use GPT-4o mini model, which has around 8 billion parameters, can output 133 tokens per second. The latency is also better with the smaller model, with GPT-4o mini having a latency of 0.41 seconds compared to a latency 0.58 seconds for GPT-4 [6]

For tailored usage, models such as Meta’s LLaMA 3 or Mistral’s 7B have been used widely for specific tasks, such as creating smart chatbots, virtual assistants, and text analytics tools, at a fraction of the cost [7]. This is most evident in healthcare, boosting workflows across a wide range of applications. In collaboration with NVIDIA, the LLaMA 3 model particularly excels in this sector, being adopted by a wide range of companies. Some AI solutions include augmented-reality real-time surgical guidance by Activ Surgical, healthcare-specific chatbot by AITEM, clinical information extraction and translation by Mendel AI [8]. In specific domains or where complexity is less demanding, SLMs show their true capabilities in cost-effectiveness. Training and fine-tuning SLMs for a specific domain or task can even provide performance that exceeds that of LLMs due to specialization.

Harnessing the Power of Language Models

Local organizations in Vietnam are also actively harnessing the potential of language models to enhance productivity and innovation across industries. A notable example is FPT Software’'s recent launch of SemiKong, the world’s first open-source LLM specifically designed for the semiconductor industry. SemiKong demonstrates superior performance, outpacing both GPT-3 and LLaMA 3 when applied in semiconductor-related tasks, and significantly improves cost-efficiency [9]. Mr. Nguyen Xuan Phong, AI Director at FPT Software, stated, "FPT Software is excited to be part of this innovative initiative and is eager to see the potential outcomes from the synergy between semiconductor technology and AI." This initiative is an encouraging start for Vietnamese companies to adopt language models to drive innovation, optimize processes, and boost overall performance.

Following in this footstep, the choice to go big or go small ultimately hinges on an organization’s unique needs and available resources. LLMs provide unmatched depth and versatility, making them ideal for complex, multi-step tasks that require nuanced understanding. On the other hand, SLMs excel in efficiency, speed, and cost-effectiveness, making them well-suited for targeted applications, especially when resources are constrained.

As technology continues to advance, both LLMs and SLMs will evolve, broadening their impact across various fields and applications. The critical factor lies in selecting the model that best aligns with an organization’s specific goals, thereby ensuring the most effective and impactful outcomes.

 

Author FPT Software