What Is a Large Language Model?
A Large Language Model (LLM) is an AI system trained on enormous volumes of text – books, websites, code, scientific papers, conversations – to learn the patterns, structure, and meaning of human language. The word “large” refers both to the volume of training data and to the number of internal parameters (adjustable numerical weights) the model uses to process information.
The defining capability of an LLM is predicting what comes next. Given any text input, the model calculates what word, phrase, or sentence would logically follow – not by looking things up in a database, but by pattern-matching against everything it absorbed during training.
How LLMs Learn: The Training Pipeline
Step 1: Data Collection
Step 2: Learning to Predict
Step 3: Alignment (RLHF)
Step 4: The Transformer Architecture
The Transformer Architecture
The Transformer is the engine underneath every modern LLM. Here’s a simplified view of how it processes your input:
graph TD
A[Your Input Text] --> B[Tokenization]
B --> C[Token Embeddings]
C --> D[Self-Attention Layers]
D --> E[Feed-Forward Networks]
E --> F[Output Probabilities]
F --> G[Generated Token]
G --> |"Feeds back as input"| D
style A fill:#e8f4fd,stroke:#2196F3
style D fill:#fff3e0,stroke:#FF9800
style G fill:#e8f5e9,stroke:#4CAF50
Parameters are the numerical weights the model learns during training. Think of them as the knobs on a massive mixing board:
| Model | Parameters | Scale |
|---|---|---|
| GPT-2 (2019) | 1.5 billion | Small by today’s standards |
| LLaMA 3 | 8B - 70B | Mid-range, very capable |
| GPT-4 | ~1.8 trillion (estimated) | Frontier scale |
| Claude Opus 4 | Undisclosed | Frontier scale |
More parameters generally means more capacity to learn patterns, but training quality and data matter more than raw size.
Inference is what happens when you actually use the model – sending it a prompt and getting a response. During inference, the model:
- Tokenizes your input
- Passes it through all transformer layers
- Calculates probability distributions for the next token
- Samples a token from that distribution
- Repeats until the response is complete
Each generated token requires a full forward pass through the entire model – this is why output tokens cost more than input tokens.
Hallucination: The Core Limitation
Because LLMs generate text based on patterns rather than retrieving verified facts, they can produce plausible-sounding but incorrect information. This occurs more often with:
- Obscure topics – less training data means weaker patterns
- Recent events – anything after the training cutoff date
- Numerical/factual tasks – LLMs are pattern matchers, not calculators
- Requests for citations – models often fabricate realistic-looking but nonexistent references
Why Hallucination Cannot Be Fully Eliminated
Hallucination is inherent to the architecture. LLMs don’t have a “truth database” they check against – they generate the most statistically likely continuation of your prompt. When the model lacks strong patterns for a topic, it fills the gap with plausible-sounding content. This is the same mechanism that makes LLMs creative and flexible – it’s a double-edged sword.
The most effective mitigation is RAG (Retrieval-Augmented Generation) – giving the model access to verified source documents before generating a response. This is covered in the Memory & RAG page.
Other Key Limitations
Training cutoff. LLMs have a knowledge cutoff date. They have no awareness of events after that date unless given tools that access current information (like web search).
No persistent memory by default. Each conversation starts fresh. The model doesn’t remember previous sessions unless external memory systems are implemented.
Not deterministic. The same prompt given twice will often produce different outputs. LLMs operate probabilistically – each token is sampled from a probability distribution.
Context window is finite. Even a 1-million-token context window has limits. Very long inputs can cause the model to lose coherence or “forget” earlier parts.
Key Concepts Summary
Tokens
The fundamental unit of LLM processing. Everything – cost, limits, performance – traces back to token count.
Context Window
The model’s working memory. Holds your prompt, history, system instructions, and the response – all competing for finite space.
RLHF
Reinforcement Learning from Human Feedback. The process that turns a raw text predictor into a helpful, safe assistant.
Hallucination
When the model generates confident but false information. A structural feature, not a fixable bug.
What’s Next?
Tokens & Pricing
Understand how tokens work, what they cost, and how to optimize your spend across different models.
Choosing a Model
Navigate the 2025-2026 model landscape and pick the right model for your use case.
AI Agents Explained
Learn how LLMs evolve from chat assistants into autonomous agents that use tools and make decisions.