How LLMs Work (No Code Required)
How LLMs Work (No Code Required)
Large language models are the technology behind tools like ChatGPT, Claude, and Gemini. Here is how they work in plain terms.
The Core Idea
An LLM is a system that predicts the next word in a sequence. Given the text "The cat sat on the," it predicts "mat" as a likely next word. When you scale this up to billions of parameters and train on enormous datasets, the model becomes capable of remarkably complex text generation.
Training: How Models Learn
Training an LLM involves three stages:
1. Pre-training
The model reads an enormous amount of text from the internet — books, articles, code, conversations. It learns the statistical patterns of language: grammar, facts, reasoning patterns, and styles. This stage requires massive computing resources and can take weeks or months.
2. Fine-tuning
After pre-training, the model is further trained on curated examples to improve its ability to follow instructions and be helpful. This is what turns a raw text predictor into a useful assistant.
3. Alignment
Additional training ensures the model behaves safely and helpfully. Techniques like RLHF (Reinforcement Learning from Human Feedback) help the model learn what humans consider good responses.
Parameters: The Model's Memory
When people say a model has "70 billion parameters," they are referring to the numbers that define the model's learned patterns. More parameters generally means more capacity to capture complex patterns, but also more expensive to run.
| Model Size | Parameters | Typical Use |
|---|---|---|
| Small | 1–7B | Simple tasks, fast responses, runs on laptops |
| Medium | 8–30B | Balanced capability and cost |
| Large | 30–100B+ | Complex reasoning, nuanced writing |
| Frontier | 100B+ | State-of-the-art performance |
Tokens: How Models Read Text
Models do not read words — they read tokens. A token is a chunk of text, typically about 3/4 of a word. "Understanding" might be two tokens: "Under" + "standing."
This matters because:
- Pricing is per-token
- Context windows are measured in tokens
- Longer text costs more and takes longer to process
Key Takeaway
LLMs are sophisticated pattern-matching systems. They do not understand meaning the way humans do, but their pattern matching is good enough to be enormously useful for practical tasks.