Lesson 6 of 6

Lesson 6 of 6 - Final Lesson

Large Language Models:
How does ChatGPT work?

ChatGPT, Gemini, Claude and GPT-4 have changed how millions of people interact with computers. This lesson demystifies how they actually work - and why they sometimes confidently say completely wrong things.

GCSE and A-Level Free Next-word prediction game

Hook

In 2023, a New York lawyer named Steven Schwartz submitted a legal brief citing six cases as precedents. He had asked ChatGPT to help with research. The cases looked convincing - correct formatting, plausible case names, realistic court references. Every single one of them was completely fabricated. None of them existed.

When the judge asked Schwartz to produce the case documents, he could not. He told the court he had not realised that ChatGPT could produce false information. He was fined and sanctioned. ChatGPT had not lied. It had done exactly what it is designed to do: produce plausible-sounding text. It just had no mechanism to know whether that text was true.

Mata v. Avianca, Inc. Southern District of New York, 2023.

Think: ChatGPT is extraordinarily useful and has passed medical licensing exams, bar exams, and coding interviews. Yet it fabricated legal cases confidently and convincingly. How can both things be true at the same time?

Explanation

Tokens: the atoms of language models

A large language model does not read words the way humans do. It reads tokens - chunks of characters that are more fine-grained than words but coarser than individual letters. Common words are single tokens. Rare words are split into multiple tokens. Here is how the sentence "The cat sat on the mat" breaks down:

The cat sat on the mat

And here is how a rarer phrase breaks down into more tokens:

Supra conduc tivity is a quantum phenomenon

GPT-4 has a context window of 128,000 tokens - roughly 100,000 words. This is everything the model can "see" at once. Nothing outside the context window affects its output.

How It Works

Next-token prediction at an extraordinary scale

The core task of an LLM during training is deceptively simple: predict the next token given all the tokens before it. Given "The cat sat on the ___", what comes next? Given millions of examples of human writing, the model learns which tokens tend to follow which sequences.

This is made possible by the attention mechanism - the key innovation in the Transformer architecture (from the 2017 paper "Attention Is All You Need" by Vaswani et al.). Attention allows the model to consider every previous token when predicting the next one, and to weight some tokens as more relevant than others. When predicting the pronoun to use for "The doctor picked up her ___", attention lets the model connect back to "doctor" and "her" across many tokens.

Scale is everything

GPT-4 is estimated to have around 1.8 trillion parameters (weights). It was trained on roughly 13 trillion tokens of text - a significant fraction of everything publicly written on the internet, plus books and scientific papers. Training took months on thousands of specialised chips. This scale is what transforms "predict the next word" from a simple game into a capability that can write code, explain concepts and synthesise knowledge.

After pre-training (learning from raw text), LLMs go through RLHF (Reinforcement Learning from Human Feedback). Human raters compare pairs of model responses and indicate which is better. The model learns to produce responses that humans prefer - which is why ChatGPT feels helpful rather than just statistically plausible.

Why do LLMs hallucinate?

"Hallucination" is when an LLM produces confident, fluent, plausible-sounding text that is factually wrong or entirely fabricated. This happens because the model is optimised to produce plausible text, not true text. It has no internal fact-checking mechanism. If the most statistically likely next token given the context is a fabricated case name, that is what it produces - with the same confidence it would use for a real one. The model has no way to distinguish between "I know this" and "I am making this up."

Interactive Activity

Think like an LLM

Next-Word Prediction Game

Choose the most likely next word - the same decision an LLM makes, millions of times

Question 1 of 5

Interactive Activity 2

Spot the hallucination

LLMs can produce responses that sound completely confident and authoritative - but contain invented facts. This is called hallucination. Below are three AI responses to the same question. Two are accurate. One contains a hallucination. Click the response you think is the hallucination.

Question asked to the AI: "Which UK universities offer dedicated undergraduate AI degrees?"

Response A

"Several UK universities offer dedicated AI or Artificial Intelligence degrees at undergraduate level, including the University of Edinburgh, which has a strong reputation in AI research, Imperial College London, and the University of Southampton's School of Electronics and Computer Science."

Response B

"You can study for an AI-specific undergraduate degree at institutions such as the University of Birmingham, which launched a BSc Artificial Intelligence in 2020, and Goldsmiths, University of London, which offers a BSc in Artificial Intelligence and Data Science. King's College London and Loughborough University also offer similar programmes."

Response C

"The University of Northam's Institute for Intelligent Systems was one of the first in the UK to offer a dedicated BSc in Artificial Intelligence, established in 1997. Their four-year MEng programme is consistently ranked in the top five nationally by the Complete University Guide."

Notice how all three responses use the same confident, authoritative tone. This is what makes hallucination so dangerous - the style gives you no clue which answer to trust.

Think Deeper

Questions worth thinking about

Question 1

LLMs are trained on text from the internet, which includes misinformation, bias, and harmful content. How does this affect what they generate?

Key points: LLMs absorb the statistical patterns of their training data, including its errors and biases. If certain false claims appear frequently in internet text, the model may reproduce them fluently. Studies have shown LLMs reflecting gender stereotypes (associating certain professions with particular genders) and cultural biases, because these patterns exist in the training data. Content filtering and RLHF can reduce some harmful outputs, but cannot eliminate bias baked into the underlying statistical patterns. This is why LLMs should not be treated as authoritative sources on factual or sensitive topics without verification.

Question 2

ChatGPT has passed bar exams, medical licensing exams, and computer science degrees. Does this mean it understands medicine, law and programming?

Key points: This is one of the most contested questions in AI research. The strong argument against understanding: LLMs produce text by predicting statistically likely sequences, not by reasoning from first principles. They can fail completely on trivially rephrased versions of problems they "passed." The strong argument for something understanding-like: the capabilities that emerge from training at scale - analogical reasoning, multi-step problem-solving, explaining novel concepts - are not obviously just pattern-matching. The honest answer is we don't know. The distinction between "very sophisticated pattern-matching" and "genuine understanding" may not be as clear as it seems, which is philosophically significant regardless of the AI.

Question 3

Who owns the copyright in text generated by an LLM? What about when the LLM was trained on copyrighted works?

Key points: Copyright in AI-generated text is currently unsettled law. In the UK, the Copyright, Designs and Patents Act 1988 grants copyright in computer-generated works to "the person who made the arrangements necessary for the creation." This could mean the user, the company, or be unclear. The US Copyright Office has stated that purely AI-generated content cannot be copyrighted since copyright requires human authorship. On training data: multiple lawsuits have been filed (by the New York Times, Getty Images, and others) claiming that training LLMs on copyrighted works without permission constitutes infringement. These cases have not yet been fully resolved. The legal landscape around AI and copyright is rapidly evolving.

Key Points

What to remember

Core takeaways - Lesson 6

LLMs work by predicting the next token in a sequence. This deceptively simple task, applied at massive scale to vast amounts of text, produces remarkably capable systems.

Tokens are the input unit, not words. Common words are single tokens; rare words or technical terms are split across multiple tokens. The context window limits how much text the model can consider at once.

Attention is the mechanism that lets the model consider all previous tokens when making each prediction, weighting some as more relevant than others.

Hallucination is an inherent property of next-token prediction. LLMs produce plausible text, not verified truth. They have no internal mechanism to distinguish what they know from what they are fabricating.

RLHF (Reinforcement Learning from Human Feedback) fine-tunes LLMs to produce responses that human raters prefer - which is why assistants like ChatGPT feel helpful, not just statistically probable.

Go and Research

Explore further

Wikipedia makes an excellent starting point for established computing concepts. For any specific fact or claim, scroll to the References section at the bottom of the article and go to the primary source directly.

In The News

US lawyers fined after submitting ChatGPT-invented court cases to a judge (2023)

June 2023

In May 2023, two New York lawyers submitted a legal brief to a federal court that cited six previous court cases to support their argument. A judge's clerks checked the citations and found that none of the cases existed. All six had been invented by ChatGPT, complete with realistic-sounding case names, docket numbers, and quotes from non-existent judges. The lawyers admitted they had used ChatGPT to conduct legal research without verifying its output. The judge fined the lawyers $5,000 each and the case became a landmark warning about LLM hallucination in professional settings. At almost exactly the same time, Samsung engineers discovered they had accidentally leaked confidential chip design data by pasting proprietary code into ChatGPT to help debug it - the input data became available to OpenAI for training. Samsung banned ChatGPT for employees within days.

Discussion questions

The lawyers said they did not know ChatGPT could invent citations. Is "I didn't know it could do that" an acceptable professional defence? What level of understanding of a tool should you have before using it in high-stakes work?

Samsung's engineers were trying to do their jobs more efficiently - a reasonable goal. Who bears the greatest responsibility for the data leak: the individual engineers, Samsung's management who hadn't banned it, or OpenAI for building a system that retains input data?

If LLMs cannot be trusted to accurately cite sources or handle confidential data, what verification processes should organisations put in place? Or should some professional uses of LLMs simply be prohibited?

BBC: Lawyers and ChatGPT fake cases BBC: Samsung ChatGPT leak

Wikipedia - Large Language Model

Read the introduction and the "Training" and "Emergent behaviour" sections. The emergent behaviour section covers surprising capabilities that appear at scale and were not specifically trained for.

Wikipedia

But how does ChatGPT actually work? - 3Blue1Brown

The most visually clear explanation of transformers and attention available. If you watch one thing after this lesson, make it this. No deep maths required.

YouTube - 27 min

OpenAI Tokenizer

Paste any text and see exactly how GPT breaks it into tokens. Try unusual words, different languages, code. This makes the concept of tokens completely concrete.

OpenAI

Wikipedia - AI Hallucination

Read the documented examples section. The legal case from this lesson's hook is mentioned, alongside other documented failures. Understanding hallucination is essential for safe AI use.

Wikipedia

Quick Quiz

Check your understanding

5 Questions

Answer all five, then submit for instant feedback

Question 1

What is a "token" in the context of a large language model?

Always a single word

A chunk of characters that the model processes as a unit - could be a word, part of a word, or a punctuation mark

A single letter

A sentence

Question 2

What is the core training task of a large language model?

Classifying text as positive or negative sentiment

Predicting the next token in a sequence given all the tokens that came before it

Searching the internet to answer questions in real time

Summarising documents shorter than the context window

Question 3

Why do large language models "hallucinate" - producing confident but false information?

Because they deliberately lie when uncertain

Because they are optimised to produce plausible text, not verified truth, and have no mechanism to distinguish what they know from what they are generating

Because their training data contains errors

Because they do not have enough parameters to store all true information

Question 4

What is RLHF (Reinforcement Learning from Human Feedback)?

A technique where the model trains itself by playing games against previous versions

A fine-tuning process where human raters compare model outputs and the model learns to produce responses humans prefer

A method of reducing the model's training time using human-written summaries

A type of adversarial training where humans try to break the model

Question 5

A student asks an LLM to write a history essay and submits it as their own work. The essay contains three plausible-sounding but entirely fabricated historical events presented as facts.

Which property of LLMs best explains why this happened?

The LLM deliberately introduced errors to avoid being detected as AI

Hallucination - LLMs generate plausible text without a truth-checking mechanism, so fabricated facts can appear alongside real ones

The LLM did not have access to enough history books in its training data

The student asked the question incorrectly

Extended Answer

Exam-style practice

Write a structured answer

Explain why large language models can produce fluent, confident text that is factually incorrect. In your answer, refer to how LLMs are trained and what they are optimised to produce. [5 marks]

[5 marks]

0 words

Mark scheme - 5 marks

LLMs are trained on the task of predicting the next token in a sequence - they learn which tokens statistically tend to follow others. (1 mark)

The model is optimised to produce plausible / statistically likely text, not text that is factually verified or true. (1 mark)

The model has no internal mechanism to distinguish between information it "knows" from training data and text it is generating based on statistical patterns. (1 mark)

A fabricated fact or name will be generated with the same confidence as a real one if it is statistically plausible given the context. (1 mark)

This is called hallucination - fluent, confident, grammatically correct text that is factually wrong or entirely invented. It is a predictable consequence of how LLMs are trained, not a defect that can be fully eliminated. (1 mark)

All five marks require understanding that the problem is structural (optimising for plausibility, not truth) rather than incidental (data errors, insufficient training).

Printable Worksheets

Practice what you've learned

Three printable worksheets covering tokens, transformers, prompts, and hallucinations at three levels: Recall, Apply, and Exam-style.

Recall

Worksheet 1

Key term matching + True/False + complete the sentence

Apply

Worksheet 2

Hallucination identification + LLMs vs search + limitations

Exam-style

Worksheet 3

Extended exam-style writing questions

Exam Practice

Lesson 6: Large Language Models - How does ChatGPT work?

GCSE-style written questions covering AI concepts. Work through them like an exam.

Start exam practice Download PDF exam

🎓

You have completed the Understanding AI series

You have covered the full arc from rules-based systems through to large language models. You understand what AI actually is, how it learns, why it goes wrong, and what the ethical stakes are. That puts you ahead of most adults who use these systems every day.

Lesson 5 Back to series overview

Lesson 6 - Teacher Resources

Large Language Models - How ChatGPT Works

Teacher mode (all pages)

Shows examiner notes on the Exam Practice page

Suggested starter (5 min)

Write on the board: "Complete this sentence: 'A study by researchers at the University of Edinburgh found that teenagers who spend more than 3 hours a day on social media are more likely to ___'" Ask students to complete it. Then: "How many of you checked whether this study exists?" Nobody will have. This is exactly how LLM hallucination works - confident completion of a sentence whether or not it is true.

Lesson objectives

1Describe how an LLM generates text using statistical prediction of the next most likely token.

2Explain what a hallucination is and why LLMs produce them - even though they appear confident and fluent.

3Evaluate at least three ethical implications of widespread LLM use, including academic integrity, misinformation, and professional responsibility.

Key vocabulary (board-ready)

Large language model (LLM)

An AI system trained on vast amounts of text to generate natural language by learning statistical patterns of which words and phrases follow others.

Token

The basic unit an LLM processes - roughly a word or part of a word. LLMs generate text by predicting the most likely next token.

Hallucination

When an LLM generates factually incorrect information with the same confidence and fluency as accurate information, because it cannot distinguish truth from plausible pattern completion.

Training corpus

The large body of text data used to train an LLM, determining what patterns and knowledge the model has access to.

Prompt

The input text given to an LLM to elicit a response. The wording of a prompt significantly influences the quality and nature of the output.

Discussion prompts

A student uses ChatGPT to write 50% of an essay and submits it as their own. Where is the line between using a tool (like a calculator) and academic dishonesty? Is the line the same for all subjects?

A lawyer submits a court brief containing AI-generated case citations that do not exist. Who is responsible: the lawyer or the AI? What does this tell us about professional responsibility?

Should AI-generated text carry a legal requirement to be labelled as AI-generated? What problems would this solve, and what new problems might it create?

Common misconceptions

X"ChatGPT looks things up on the internet" - most LLMs generate text entirely from patterns learned during training. They do not search or check facts; they predict likely text.

X"If the response sounds confident, it's probably correct" - hallucinations are delivered with the same fluency and confidence as accurate responses. Tone is not a reliability indicator.

X"LLMs understand language" - they model statistical patterns in text. Whether this constitutes understanding in any meaningful sense is genuinely contested in computer science.

Exit ticket questions

Explain what a hallucination is in the context of a large language model.

[2 marks]

Suggest one reason why it is risky to use an LLM for medical or legal advice.

[1 mark]

Describe one ethical concern raised by training LLMs on text collected from the internet.

[2 marks]

Homework idea

Use an AI chatbot to ask three questions about a topic you know well from another subject. Write down each response. Verify each claim using a reliable source. Record: how many claims were accurate, how many were inaccurate, and how many you could not verify. Write a short conclusion about how you would advise a Year 7 student to use AI tools for homework.

Classroom tips

Spot the Hallucination works best if students genuinely try to verify the claims before the reveal. Allow 4-5 minutes and actively encourage search engine use.

Introduce next-token prediction with: "The capital of France is ___". Then escalate to: "A study by Smith et al. (2021) found that ___" to show how hallucinations emerge naturally from the mechanism.

This is the most discussion-rich lesson. Budget 40 minutes minimum if using with a class. It pairs directly with Q6 on the Exam Practice page.

Resources

AI Ethics Exam Practice Download student worksheet (PDF) Set as class homework (coming soon)