A
Agent
An AI system that can take actions in the real world – not just generate text. Agents use tools (APIs, databases, code execution) to complete tasks autonomously. Key difference from a chatbot: agents do things, chatbots say things.
Related: ReAct, Tool Use, Multi-Agent Systems
Alignment
The process of making an AI model’s behavior match human intentions and values. Primarily achieved through RLHF (Reinforcement Learning from Human Feedback). Alignment is what turns a raw text predictor into a helpful, safe assistant.
Why it matters: An unaligned model produces outputs that mirror all patterns in training data, including harmful ones.
Attention Mechanism
The core innovation of the Transformer architecture. Allows each token in a sequence to “look at” every other token and determine relevance. This is why LLMs can understand that “it” in “The cat sat on the mat because it was tired” refers to the cat, not the mat.
Related: Transformer, Self-Attention
API (Application Programming Interface)
A standardized way for software to communicate. In AI, this typically means sending a prompt to a model hosted on a server and receiving a response. OpenAI, Anthropic, and Google all provide APIs for their models.
Example: Your n8n workflow calls the Claude API to process customer emails.
B
BPE (Byte Pair Encoding)
The algorithm used by most LLMs to break text into tokens. It identifies frequently occurring character combinations in training data and merges them into single tokens. Common words become one token; rare strings get split into many small pieces.
Why it matters: Determines how many tokens your text consumes (and therefore costs).
C
Chain-of-Thought (CoT)
A prompting technique where the model is asked to show its reasoning step by step before giving a final answer. Dramatically improves performance on math, logic, and complex reasoning tasks.
Example prompt: “Think step by step before answering.”
Chunking
Splitting documents into smaller pieces before embedding them in a vector database. How you chunk affects retrieval quality. Common strategies: fixed-size, paragraph-based, semantic (topic-change based).
Related: RAG, Embeddings, Vector Database
Context Window
The maximum number of tokens an LLM can process in a single interaction. Think of it as working memory. Everything – system prompt, conversation history, your message, and the response – competes for this finite space. Ranges from 8K to 1M tokens depending on the model.
Closed Source
AI models where the weights, training data, and architecture are kept private. You access via API. Examples: GPT-5, Claude Opus 4, Gemini 2.5 Pro.
Contrast: Open Source
D
Distillation
A technique where a large, capable model “teaches” a smaller model to replicate its behavior. The smaller model learns from the larger model’s outputs, achieving surprisingly good performance at a fraction of the compute cost. This is why 8B models in 2025 can match 70B models from 2023.
E
Embedding
A numerical representation of content (text, images, audio) that captures its meaning as a list of numbers (a vector). Similar content produces similar embeddings, enabling semantic search. The foundation of RAG and vector databases.
Example: “Return policy” and “Can I get my money back?” produce similar embeddings despite using completely different words.
Episodic Memory
In AI agents, the storage of specific past events and experiences. “What happened before.” Records of past conversations, task outcomes, and user preferences expressed over time. Distinguished from semantic memory (facts) by being tied to specific events.
F
Few-Shot Prompting
Providing the model with examples of the desired input-output pattern before asking it to perform. Typically 2-5 examples. Significantly improves output quality and consistency for structured tasks.
Example: “Here are 3 examples of how to extract entities from text: [examples]. Now extract entities from this text: [your text]”
Fine-Tuning
Further training a pre-trained model on your specific data to customize its behavior. Changes the model’s weights. More permanent than prompting but requires significant compute. Best for teaching specific style, tone, or domain behavior – not for factual knowledge (use RAG for that).
Function Calling
A model capability where the LLM can decide to invoke a predefined function (tool) with specific parameters, rather than just generating text. The system executes the function and returns results to the LLM. The foundation of AI agents.
Also called: Tool Use
G
GPU (Graphics Processing Unit)
Originally designed for rendering graphics, GPUs excel at the parallel mathematical operations required for AI. Training and running LLMs requires significant GPU compute. NVIDIA dominates this market with H100, A100, and consumer GPUs like RTX 4090.
Grounding
Connecting a model’s responses to verifiable source material rather than relying solely on trained patterns. RAG is the primary grounding technique – the model responds based on retrieved documents rather than generating from memory alone. Reduces hallucination.
H
Hallucination
When an LLM generates confident-sounding statements that are factually incorrect. A structural feature of how LLMs work (pattern matching, not fact retrieval), not a fixable bug. More common on obscure topics, recent events, and numerical tasks. Primary mitigation: RAG.
I
Inference
The process of running a trained model to generate predictions or outputs. When you send a prompt to ChatGPT and get a response, that’s inference. Distinguished from training (which builds the model). Inference cost = the per-token pricing you pay.
L
LLM (Large Language Model)
An AI system trained on massive text data to predict the next token in a sequence. “Large” refers to both data volume and parameter count (billions to trillions). The engine behind ChatGPT, Claude, Gemini. Generates text, doesn’t retrieve facts.
LoRA (Low-Rank Adaptation)
A technique for fine-tuning LLMs efficiently by training only a small number of additional parameters rather than modifying the entire model. Makes fine-tuning feasible on consumer hardware. Widely used for custom open-source model adaptations.
M
MCP (Model Context Protocol)
A standard protocol (developed by Anthropic) that allows AI models to connect to external tools and data sources in a standardized way. Think of it as USB-C for AI – a single standard for connecting models to any tool. Rapidly being adopted across the AI ecosystem.
Mixture of Experts (MoE)
A model architecture where different “expert” sub-networks specialize in different types of input. Only relevant experts activate for each request, making the model more efficient. Used by Mixtral and reportedly by GPT-4.
Multimodal
The ability to process multiple types of input (text, images, audio, video) in a single model. GPT-4o, Gemini 2.5, and Claude 3.5 Sonnet are multimodal – they can analyze images alongside text.
O
Open Source (Open Weights)
AI models whose parameters are publicly released. Anyone can download, run, modify, and build on them. Examples: LLaMA 4, Mistral, DeepSeek-R1. Enables self-hosting for privacy, cost control, and customization.
Contrast: Closed Source
P
Parameters
The numerical weights learned during training that define a model’s behavior. More parameters generally means more capacity, but quality of training matters more. GPT-4: ~1.8T estimated. LLaMA 3: 8B-70B. The “large” in Large Language Model.
Prompt Engineering
The craft of writing effective instructions for LLMs. Includes techniques like few-shot examples, chain-of-thought, role assignment, and structured output formatting. The primary way to control LLM behavior without fine-tuning.
R
RAG (Retrieval-Augmented Generation)
A technique that augments LLM responses with information retrieved from external sources (typically a vector database). The dominant architecture for production AI agents. Prevents hallucination by grounding responses in real, retrieved data.
How it works: Query -> Embed -> Search vector DB -> Retrieve relevant chunks -> Inject into context -> Generate grounded response
ReAct (Reason + Act)
The most common AI agent pattern. The LLM alternates between reasoning about what to do and acting by calling tools. Think -> Act -> Observe -> Think -> … Used by LangChain, n8n AI Agent, and most production agents.
RLHF (Reinforcement Learning from Human Feedback)
The training technique that aligns LLMs with human preferences. Humans rate model responses; the model is adjusted to produce more of the preferred responses. What turns a raw text predictor into a helpful assistant.
S
Semantic Memory
In AI agents, the storage of facts, concepts, and knowledge. “What things are.” Company policies, product docs, domain knowledge. Typically stored in vector databases and retrieved via RAG.
Distinguished from: Episodic memory (events) and procedural memory (skills).
Semantic Search
Finding information based on meaning rather than exact keyword matching. Powered by embeddings. “Can I return this?” matches “Refund policy” because their meanings are similar, even though the words are different.
T
Temperature
A parameter controlling randomness in LLM outputs. Low temperature (0-0.3) = more focused, deterministic responses. High temperature (0.7-1.0) = more creative, varied responses. Set to 0 for factual tasks, higher for creative tasks.
Token
The basic unit of text processing for LLMs. Can be a whole word, part of a word, or punctuation. Everything – cost, context window limits, performance – traces back to token count. ~100 tokens = ~75 words.
Transformer
The neural network architecture underlying all major LLMs (GPT, Claude, Gemini, LLaMA). Introduced by Google in 2017 (“Attention Is All You Need”). Its key innovation: the attention mechanism, which lets the model weigh the relevance of every token against every other token.
V
Vector Database
A specialized database for storing and searching embeddings (numerical representations of meaning). Enables semantic search at scale. Leading tools: Pinecone, Weaviate, Chroma, Qdrant, Supabase/pgvector.
Z
Zero-Shot
Asking a model to perform a task without providing any examples. “Classify this email as spam or not spam.” Contrasted with few-shot (providing examples first). Modern LLMs are remarkably good at zero-shot tasks due to extensive pre-training.
Quick Reference Table
| Term | One-Line Definition |
|---|---|
| Agent | AI that takes actions, not just generates text |
| API | Interface for sending prompts to AI models |
| Chunking | Splitting documents for vector database storage |
| Context Window | Maximum tokens a model can process at once |
| Embedding | Numbers representing the meaning of content |
| Fine-Tuning | Retraining a model on custom data |
| Hallucination | AI generating confident but incorrect information |
| Inference | Running a model to get predictions |
| LLM | Large Language Model – the AI engine |
| MCP | Model Context Protocol – standard for tool connections |
| Multimodal | Processing text, images, audio in one model |
| Parameters | Learned weights defining model behavior |
| RAG | Retrieval-Augmented Generation |
| RLHF | Training AI with human feedback |
| Semantic Search | Finding information by meaning, not keywords |
| Token | Basic unit of LLM text processing |
| Transformer | Architecture underlying all major LLMs |
| Vector Database | Storage for meaning-based search |