AI Terminology Glossary

50+ Terms Every Leader Must Know

Transformer

The neural network architecture that powers all modern LLMs. Introduced in the 2017 paper "Attention Is All You Need" by Google Brain researchers. Processes words in parallel rather than sequentially, enabling massive scale. Every major model — GPT, Claude, Gemini, LLaMA, Qwen — is transformer-based. It is the single most important architectural innovation in modern AI.

Token

The basic unit of text that an LLM processes. A token is roughly three-quarters of an English word — "hamburger" might be one token, while "unbelievable" might split into two. Models are priced and limited by token counts. A typical English word is 1.3 tokens. Understanding tokens is essential for managing API costs and context window limits.

Context Window

The maximum amount of text (measured in tokens) that a model can process in a single interaction. Early models had 2K-token windows; current frontier models support 128K to 1M+ tokens. A larger context window allows processing entire documents, codebases, or conversation histories — but increases computational cost quadratically in standard transformer architectures.

Parameters

The numerical values (weights) inside a neural network that are adjusted during training. Model size is measured by parameter count — 7B (7 billion) is considered small, 70B mid-range, and 400B+ frontier. More parameters generally mean greater capability but higher computational cost for both training and inference. GPT-4 class models are estimated at over 1.5 trillion parameters as of 2026. As of mid-2026, models like GPT-5 have reportedly reached 2 trillion parameters, further pushing the boundaries of AI capabilities.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human raters evaluate model outputs, creating a reward signal that teaches the model to produce more helpful, harmless responses. Pioneered by OpenAI for ChatGPT. Now used by Anthropic (constitutional AI variant), Google, Meta, and most major AI companies. RLHF is what transforms a raw language model into a useful assistant.

Fine-tuning

The process of taking a pre-trained model and training it further on a specific dataset to adapt it for particular tasks or domains. A medical company might fine-tune a general LLM on clinical literature to create a specialized medical assistant. Fine-tuning requires far less data and compute than training from scratch — often achievable in hours on a single GPU.

RAG (Retrieval-Augmented Generation)

A technique that enhances LLM responses by first retrieving relevant documents from an external knowledge base, then feeding those documents to the model as context. RAG addresses hallucination and staleness problems without retraining. Most enterprise AI deployments use RAG to ground model outputs in authoritative, up-to-date company data.

Prompt Engineering

The practice of crafting input text (prompts) to elicit desired outputs from an LLM. Includes techniques like system prompts, few-shot examples, chain-of-thought instructions, and role assignment. Prompt engineering is the primary interface between humans and foundation models — a new form of programming that uses natural language rather than code.

Hallucination

When an LLM generates plausible-sounding but factually incorrect information with apparent confidence. Hallucination is the most significant reliability problem in production AI systems. It occurs because models generate statistically likely text rather than retrieving verified facts. Mitigation strategies include RAG, grounding, and output verification systems.

Multimodal

A model that can process and/or generate multiple types of data — text, images, audio, video, or code — within a single architecture. GPT-4o, Gemini, and Claude 3.5 are multimodal. This capability enables applications like analyzing images, transcribing audio, generating visual content, and understanding documents with charts and diagrams.

Foundation Model

A large AI model trained on broad data that can be adapted to many downstream tasks. The term was coined by Stanford's HAI Institute in 2021. Foundation models (GPT-4, LLaMA, Gemini) serve as the base layer upon which applications are built. The foundation model paradigm has largely replaced the older practice of training specialized models for each task from scratch.

LLM (Large Language Model)

A neural network trained on massive text corpora to understand and generate human language. LLMs predict the next token in a sequence, which emerges as seemingly intelligent behavior at scale. The "large" refers to both parameter count (billions to trillions) and training data (trillions of tokens). ChatGPT, Claude, and Gemini are all LLMs.

Mixture of Experts (MoE)

An architecture that activates only a subset of a model's parameters for each input, reducing computational cost while maintaining large total parameter counts. DeepSeek-V3 (671B total, 37B active) and Qwen3-235B (22B active) use MoE. This approach allows models to be large and knowledgeable while remaining efficient at inference time.

Quantization

Reducing the precision of model weights (e.g., from 16-bit to 4-bit numbers) to decrease memory usage and inference cost with minimal quality loss. A quantized 70B model might run on consumer hardware that the full-precision version could not. Common formats include INT8, INT4, and GGUF. Quantization is essential for deploying AI on edge devices and controlling cloud costs.

Agentic AI

AI systems that can autonomously plan, execute multi-step tasks, use tools, and make decisions with minimal human oversight. Unlike chatbots that respond to single prompts, agents can break complex goals into subtasks, call external APIs, browse the web, write and execute code, and iterate on their own output. Agentic AI is widely seen as the next major paradigm in AI applications.

MCP (Model Context Protocol)

An open protocol introduced by Anthropic in late 2024 that standardizes how AI models connect to external data sources and tools. MCP provides a universal interface for models to access databases, file systems, APIs, and other resources, facilitating seamless integration and enhancing the capabilities of AI systems.