Large Language Models:
How does ChatGPT work?
ChatGPT, Gemini, Claude and GPT-4 have changed how millions of people interact with computers. This lesson demystifies how they actually work - and why they sometimes confidently say completely wrong things.
In 2023, a New York lawyer named Steven Schwartz submitted a legal brief citing six cases as precedents.
He had asked ChatGPT to help with research. The cases looked convincing - correct formatting,
plausible case names, realistic court references. Every single one of them was completely fabricated.
None of them existed.
When the judge asked Schwartz to produce the case documents, he could not. He told the court he had
not realised that ChatGPT could produce false information. He was fined and sanctioned.
ChatGPT had not lied. It had done exactly what it is designed to do: produce plausible-sounding text.
It just had no mechanism to know whether that text was true.
Mata v. Avianca, Inc. Southern District of New York, 2023.
Tokens: the atoms of language models
A large language model does not read words the way humans do. It reads tokens - chunks of characters that are more fine-grained than words but coarser than individual letters. Common words are single tokens. Rare words are split into multiple tokens. Here is how the sentence "The cat sat on the mat" breaks down:
And here is how a rarer phrase breaks down into more tokens:
GPT-4 has a context window of 128,000 tokens - roughly 100,000 words. This is everything the model can "see" at once. Nothing outside the context window affects its output.
Next-token prediction at an extraordinary scale
The core task of an LLM during training is deceptively simple: predict the next token given all the tokens before it. Given "The cat sat on the ___", what comes next? Given millions of examples of human writing, the model learns which tokens tend to follow which sequences.
This is made possible by the attention mechanism - the key innovation in the Transformer architecture (from the 2017 paper "Attention Is All You Need" by Vaswani et al.). Attention allows the model to consider every previous token when predicting the next one, and to weight some tokens as more relevant than others. When predicting the pronoun to use for "The doctor picked up her ___", attention lets the model connect back to "doctor" and "her" across many tokens.
After pre-training (learning from raw text), LLMs go through RLHF (Reinforcement Learning from Human Feedback). Human raters compare pairs of model responses and indicate which is better. The model learns to produce responses that humans prefer - which is why ChatGPT feels helpful rather than just statistically plausible.
Think like an LLM
Spot the hallucination
LLMs can produce responses that sound completely confident and authoritative - but contain invented facts. This is called hallucination. Below are three AI responses to the same question. Two are accurate. One contains a hallucination. Click the response you think is the hallucination.
Notice how all three responses use the same confident, authoritative tone. This is what makes hallucination so dangerous - the style gives you no clue which answer to trust.
Questions worth thinking about
What to remember
Explore further
Wikipedia makes an excellent starting point for established computing concepts. For any specific fact or claim, scroll to the References section at the bottom of the article and go to the primary source directly.
Check your understanding
Exam-style practice
Practice what you've learned
Three printable worksheets covering tokens, transformers, prompts, and hallucinations at three levels: Recall, Apply, and Exam-style.