AI Glossary

A-Z reference of essential AI terms with plain-English definitions and practical context.

A-Z reference of essential AI terms. Bookmark this page for quick lookups. Each term includes a plain-English definition and context for when and why it matters.

A

Agent

An AI system that can take actions in the real world – not just generate text. Agents use tools (APIs, databases, code execution) to complete tasks autonomously. Key difference from a chatbot: agents do things, chatbots say things.

Related: ReAct, Tool Use, Multi-Agent Systems

Alignment

The process of making an AI model’s behavior match human intentions and values. Primarily achieved through RLHF (Reinforcement Learning from Human Feedback). Alignment is what turns a raw text predictor into a helpful, safe assistant.

Why it matters: An unaligned model produces outputs that mirror all patterns in training data, including harmful ones.

Attention Mechanism

The core innovation of the Transformer architecture. Allows each token in a sequence to “look at” every other token and determine relevance. This is why LLMs can understand that “it” in “The cat sat on the mat because it was tired” refers to the cat, not the mat.

Related: Transformer, Self-Attention

API (Application Programming Interface)

A standardized way for software to communicate. In AI, this typically means sending a prompt to a model hosted on a server and receiving a response. OpenAI, Anthropic, and Google all provide APIs for their models.

Example: Your n8n workflow calls the Claude API to process customer emails.

B

BPE (Byte Pair Encoding)

The algorithm used by most LLMs to break text into tokens. It identifies frequently occurring character combinations in training data and merges them into single tokens. Common words become one token; rare strings get split into many small pieces.

Why it matters: Determines how many tokens your text consumes (and therefore costs).

C

Chain-of-Thought (CoT)

A prompting technique where the model is asked to show its reasoning step by step before giving a final answer. Dramatically improves performance on math, logic, and complex reasoning tasks.

Example prompt: “Think step by step before answering.”

Chunking

Splitting documents into smaller pieces before embedding them in a vector database. How you chunk affects retrieval quality. Common strategies: fixed-size, paragraph-based, semantic (topic-change based).

Related: RAG, Embeddings, Vector Database

Context Window

The maximum number of tokens an LLM can process in a single interaction. Think of it as working memory. Everything – system prompt, conversation history, your message, and the response – competes for this finite space. Ranges from 8K to 1M tokens depending on the model.

Closed Source

AI models where the weights, training data, and architecture are kept private. You access via API. Examples: GPT-5, Claude Opus 4, Gemini 2.5 Pro.

Contrast: Open Source

D

Distillation

A technique where a large, capable model “teaches” a smaller model to replicate its behavior. The smaller model learns from the larger model’s outputs, achieving surprisingly good performance at a fraction of the compute cost. This is why 8B models in 2025 can match 70B models from 2023.

E

Embedding

A numerical representation of content (text, images, audio) that captures its meaning as a list of numbers (a vector). Similar content produces similar embeddings, enabling semantic search. The foundation of RAG and vector databases.

Example: “Return policy” and “Can I get my money back?” produce similar embeddings despite using completely different words.

Episodic Memory

In AI agents, the storage of specific past events and experiences. “What happened before.” Records of past conversations, task outcomes, and user preferences expressed over time. Distinguished from semantic memory (facts) by being tied to specific events.

F

Few-Shot Prompting

Providing the model with examples of the desired input-output pattern before asking it to perform. Typically 2-5 examples. Significantly improves output quality and consistency for structured tasks.

Example: “Here are 3 examples of how to extract entities from text: [examples]. Now extract entities from this text: [your text]”

Fine-Tuning

Further training a pre-trained model on your specific data to customize its behavior. Changes the model’s weights. More permanent than prompting but requires significant compute. Best for teaching specific style, tone, or domain behavior – not for factual knowledge (use RAG for that).

Function Calling

A model capability where the LLM can decide to invoke a predefined function (tool) with specific parameters, rather than just generating text. The system executes the function and returns results to the LLM. The foundation of AI agents.

Also called: Tool Use

G

GPU (Graphics Processing Unit)

Originally designed for rendering graphics, GPUs excel at the parallel mathematical operations required for AI. Training and running LLMs requires significant GPU compute. NVIDIA dominates this market with H100, A100, and consumer GPUs like RTX 4090.

Grounding

Connecting a model’s responses to verifiable source material rather than relying solely on trained patterns. RAG is the primary grounding technique – the model responds based on retrieved documents rather than generating from memory alone. Reduces hallucination.

H

Hallucination

When an LLM generates confident-sounding statements that are factually incorrect. A structural feature of how LLMs work (pattern matching, not fact retrieval), not a fixable bug. More common on obscure topics, recent events, and numerical tasks. Primary mitigation: RAG.

I

Inference

The process of running a trained model to generate predictions or outputs. When you send a prompt to ChatGPT and get a response, that’s inference. Distinguished from training (which builds the model). Inference cost = the per-token pricing you pay.

L

LLM (Large Language Model)

An AI system trained on massive text data to predict the next token in a sequence. “Large” refers to both data volume and parameter count (billions to trillions). The engine behind ChatGPT, Claude, Gemini. Generates text, doesn’t retrieve facts.

LoRA (Low-Rank Adaptation)

A technique for fine-tuning LLMs efficiently by training only a small number of additional parameters rather than modifying the entire model. Makes fine-tuning feasible on consumer hardware. Widely used for custom open-source model adaptations.

M

MCP (Model Context Protocol)

A standard protocol (developed by Anthropic) that allows AI models to connect to external tools and data sources in a standardized way. Think of it as USB-C for AI – a single standard for connecting models to any tool. Rapidly being adopted across the AI ecosystem.

Mixture of Experts (MoE)

A model architecture where different “expert” sub-networks specialize in different types of input. Only relevant experts activate for each request, making the model more efficient. Used by Mixtral and reportedly by GPT-4.

Multimodal

The ability to process multiple types of input (text, images, audio, video) in a single model. GPT-4o, Gemini 2.5, and Claude 3.5 Sonnet are multimodal – they can analyze images alongside text.

O

Open Source (Open Weights)

AI models whose parameters are publicly released. Anyone can download, run, modify, and build on them. Examples: LLaMA 4, Mistral, DeepSeek-R1. Enables self-hosting for privacy, cost control, and customization.

Contrast: Closed Source

P

Parameters

The numerical weights learned during training that define a model’s behavior. More parameters generally means more capacity, but quality of training matters more. GPT-4: ~1.8T estimated. LLaMA 3: 8B-70B. The “large” in Large Language Model.

Prompt Engineering

The craft of writing effective instructions for LLMs. Includes techniques like few-shot examples, chain-of-thought, role assignment, and structured output formatting. The primary way to control LLM behavior without fine-tuning.

R

RAG (Retrieval-Augmented Generation)

A technique that augments LLM responses with information retrieved from external sources (typically a vector database). The dominant architecture for production AI agents. Prevents hallucination by grounding responses in real, retrieved data.

How it works: Query -> Embed -> Search vector DB -> Retrieve relevant chunks -> Inject into context -> Generate grounded response

ReAct (Reason + Act)

The most common AI agent pattern. The LLM alternates between reasoning about what to do and acting by calling tools. Think -> Act -> Observe -> Think -> … Used by LangChain, n8n AI Agent, and most production agents.

RLHF (Reinforcement Learning from Human Feedback)

The training technique that aligns LLMs with human preferences. Humans rate model responses; the model is adjusted to produce more of the preferred responses. What turns a raw text predictor into a helpful assistant.

S

Semantic Memory

In AI agents, the storage of facts, concepts, and knowledge. “What things are.” Company policies, product docs, domain knowledge. Typically stored in vector databases and retrieved via RAG.

Distinguished from: Episodic memory (events) and procedural memory (skills).

Semantic Search

Finding information based on meaning rather than exact keyword matching. Powered by embeddings. “Can I return this?” matches “Refund policy” because their meanings are similar, even though the words are different.

T

Temperature

A parameter controlling randomness in LLM outputs. Low temperature (0-0.3) = more focused, deterministic responses. High temperature (0.7-1.0) = more creative, varied responses. Set to 0 for factual tasks, higher for creative tasks.

Token

The basic unit of text processing for LLMs. Can be a whole word, part of a word, or punctuation. Everything – cost, context window limits, performance – traces back to token count. ~100 tokens = ~75 words.

Transformer

The neural network architecture underlying all major LLMs (GPT, Claude, Gemini, LLaMA). Introduced by Google in 2017 (“Attention Is All You Need”). Its key innovation: the attention mechanism, which lets the model weigh the relevance of every token against every other token.

V

Vector Database

A specialized database for storing and searching embeddings (numerical representations of meaning). Enables semantic search at scale. Leading tools: Pinecone, Weaviate, Chroma, Qdrant, Supabase/pgvector.

Z

Zero-Shot

Asking a model to perform a task without providing any examples. “Classify this email as spam or not spam.” Contrasted with few-shot (providing examples first). Modern LLMs are remarkably good at zero-shot tasks due to extensive pre-training.

Quick Reference Table

Term	One-Line Definition
Agent	AI that takes actions, not just generates text
API	Interface for sending prompts to AI models
Chunking	Splitting documents for vector database storage
Context Window	Maximum tokens a model can process at once
Embedding	Numbers representing the meaning of content
Fine-Tuning	Retraining a model on custom data
Hallucination	AI generating confident but incorrect information
Inference	Running a model to get predictions
LLM	Large Language Model – the AI engine
MCP	Model Context Protocol – standard for tool connections
Multimodal	Processing text, images, audio in one model
Parameters	Learned weights defining model behavior
RAG	Retrieval-Augmented Generation
RLHF	Training AI with human feedback
Semantic Search	Finding information by meaning, not keywords
Token	Basic unit of LLM text processing
Transformer	Architecture underlying all major LLMs
Vector Database	Storage for meaning-based search

This glossary is a living document. As new concepts emerge in the fast-moving AI landscape, entries will be added and updated. Last updated: March 2026.

Deployment Models: Cloud vs. On-Premise