Lesson 3 of 6
Understanding AI Lesson 3 - Classification and Decision Making
Lesson 3 of 6

Classification and
Decision Making

Classification is what most AI systems do in practice: look at an input and assign it to a category. This lesson covers decision trees, confidence scores, and what happens when the AI gets it wrong.

GCSE and A-Level Free Interactive decision tree

In 2020, a study published in Nature Medicine found that an AI could detect breast cancer from mammograms more accurately than radiologists. It reduced false negatives by 9.4% and false positives by 5.7%. The AI was classifying each scan into one of two categories: cancer or no cancer.

But here is the uncomfortable part. The AI could not say why it made each decision. The radiologist could point to the specific area of concern and explain their reasoning. The AI just gave a probability: 87% likely malignant. You either trusted it or you did not.

McKinney et al., Nature Medicine, 2020.

Think: Would you want a medical decision about you made by a system that cannot explain its reasoning? What would need to change before you would trust it?

What classification means

Classification is the task of assigning an input to one of several predefined categories. It is probably the most common task in applied machine learning.

Binary classification assigns each input to one of exactly two classes: spam or not spam, fraud or legitimate, cancer or no cancer, pass or fail. The output is a decision plus a confidence score - a probability between 0 and 1 representing how certain the model is.

Multi-class classification assigns each input to one of several categories: a photo to "cat," "dog," or "rabbit." A handwritten digit to 0 through 9. A piece of text to one of dozens of topics. The model produces a confidence score for every possible class, and picks the highest.

Confidence scores matter
A model that says "92% spam" is very different from one that says "51% spam." Many real systems use a threshold: only flag something as spam if confidence is above 70%, for example. Setting this threshold is a human decision with real consequences - too high and you miss spam, too low and you block legitimate emails.

Decision trees are one of the oldest and most interpretable classification methods. They work by asking a series of yes/no questions about the input features, following different branches based on the answers, until they reach a leaf node - a final classification. Unlike neural networks, every decision is explainable: you can trace exactly why the model classified something the way it did.

Precision vs Recall
Precision: Of all the things the model labelled as positive, how many actually were? High precision means few false alarms.

Recall: Of all the actual positives, how many did the model find? High recall means few things missed.

In cancer screening, you want high recall - missing a real cancer is worse than a false alarm. In spam filtering, you want high precision - blocking a real email is worse than missing spam.

A decision tree, visualised

This is how a decision tree classifies an email as spam or not spam. Each node is a question; each branch is a yes/no answer; each leaf is a classification.

Does the email contain "FREE" or "WINNER"?
YES
Is the sender in your contacts?
YES
Not Spam
NO
Spam
NO
Does it have more than 3 links?
YES
Review
NO
Not Spam

Notice how every decision is traceable. You can always answer "why did it classify this as spam?" - which is something neural networks often cannot do.

Walk the tree

Animal Classifier - Decision Tree
Answer yes/no questions to classify an animal

You are walking through a decision tree that classifies animals. Answer each question about the animal you are thinking of. The tree will follow your answers to a classification.

Start: thinking of any animal
Does the animal have four legs?

False positive or false negative?

Precision and recall sound abstract. These six real-world scenarios will make them concrete. For each one, decide whether the AI outcome described is a false positive (AI flags something that is not actually there) or a false negative (AI misses something that actually is there).

A spam filter allows a phishing email to reach your inbox. You almost click a malicious link.
An airport security AI flags an innocent passenger as a potential threat. They are taken aside for questioning and miss their flight.
A cancer screening AI gives a patient the all-clear. The patient actually has early-stage cancer that could have been treated.
A self-driving car brakes sharply for a shadow on the road that it classifies as a pedestrian. No one is actually there.
A content moderation AI removes a legitimate news article about protest violence, classifying it as harmful content.
A plagiarism detector passes a student's essay even though 60% of it was copied directly from the internet.

Questions worth thinking about

Question 1
A facial recognition system has 98% accuracy. In a city of 1 million people, if 1,000 are actually on a wanted list, how many false positives would be generated if you scan everyone?
Working through the maths: 2% false positive rate on 999,000 innocent people = 19,980 false positives. The system would correctly identify roughly 980 of the 1,000 wanted people (98% recall), but generate about 19,980 false alarms. For every wanted person correctly identified, roughly 20 innocent people would be falsely flagged. This is why high accuracy is not enough for high-stakes decisions - the base rate (how rare the thing you are looking for is) matters enormously. This is called the base rate fallacy.
Question 2
In cancer screening, should you prioritise precision or recall? What about in a spam filter? Why might the right answer be different?
Key points: In cancer screening, missing a real cancer (false negative) is potentially fatal. So you prioritise high recall - catch everything, even if it means more false alarms that lead to unnecessary follow-up tests. The cost of a false negative is higher than a false positive. In spam filtering, the opposite is often true: accidentally blocking a real email (false negative for spam = false positive for legitimate email) is annoying and potentially costly for businesses. So you prioritise precision - only flag something as spam when you are confident. The right trade-off depends entirely on the relative costs of each type of error.
Question 3
Decision trees are interpretable - you can explain every decision. Neural networks are often not. When does interpretability matter, and when does it not?
Key points: Interpretability matters most when: decisions significantly affect individuals (loans, sentencing, medical diagnosis), when accountability is legally required (GDPR gives people the right to explanation for automated decisions), when debugging requires understanding why errors occurred, or when trust needs to be built with stakeholders. It matters less when: decisions are low-stakes (recommending a playlist), when accuracy is so much higher that the trade-off is justified, or when the system is one input among many that a human reviews. The EU AI Act 2024 specifically requires explainability for high-risk AI systems.

What to remember

Core takeaways - Lesson 3
1
Classification assigns inputs to categories. Binary classification uses two classes; multi-class uses more. Almost every practical AI system performs some form of classification.
2
Confidence scores are probabilities. A model rarely says "definitely spam" - it says "87% spam." The threshold at which you act on that score is a human decision.
3
Decision trees are interpretable. Every classification can be traced back through a series of questions. This makes them popular in regulated industries where decisions must be explained.
4
Precision and recall measure different things. Precision: how often is a positive prediction correct? Recall: how many actual positives were found? The right trade-off depends on the cost of each type of error.
5
High accuracy does not mean safe to deploy. When the thing you are looking for is rare, even a highly accurate system generates enormous numbers of false positives.

Explore further

Wikipedia makes an excellent starting point for established computing concepts. For any specific fact or claim, scroll to the References section at the bottom of the article and go to the primary source directly.

In The News
Metropolitan Police roll out live facial recognition across London (2023-2024)
2023 onwards - ongoing
From 2023, the Metropolitan Police began deploying live facial recognition cameras at high streets, shopping centres and events across London, scanning passers-by and matching faces against a watchlist in real time. Civil liberties groups raised serious concerns: in independent trials, the system misidentified people at rates as high as 81% in early deployments. In 2023, a man was wrongly stopped and searched after the AI matched his face to a wanted person who did not resemble him. No specific UK law authorises or restricts this use, meaning police are using the technology under general policing powers while Parliament has not yet debated a dedicated law.
Discussion questions
In a crowd of 100,000 people, an 81% false positive rate would mean tens of thousands of wrong identifications. Who bears responsibility for each wrongful stop: the police officer who acts on the alert, or the team that deployed the AI?
Unlike the EU (which has banned real-time facial recognition in most public spaces), the UK has no specific law on this. Should Parliament pass one? What would it need to say to be fair to both civil liberties and public safety?
If you were walking down the high street and an AI system incorrectly flagged your face as matching a criminal suspect, what rights would you want to have? Who should you be able to complain to, and how quickly?
Read more: Facial Recognition Systems (Wikipedia)

Check your understanding

5 Questions
Answer all five, then submit for instant feedback
Question 1
A model classifies an email as "spam" with 53% confidence. What does this tell you?
The email is definitely spam
The model is only slightly more confident it is spam than not spam - this is a low-confidence prediction
53% of emails like this one are spam
The model has made an error
Question 2
What is the main advantage of a decision tree over a neural network?
Decision trees are always more accurate
Decision trees can process more data
Decision trees are interpretable - every decision can be traced through a sequence of questions
Decision trees do not require training data
Question 3
In a medical diagnosis system, which is generally worse: a false positive or a false negative?
A false positive - diagnosing illness in a healthy person
A false negative - missing a real illness in a sick person
They are always equally bad
It depends only on the cost of treatment
Question 4
What is multi-class classification?
Classifying data into exactly two categories
Using multiple models to classify the same input
Assigning an input to one of three or more categories
Classifying data without using any training labels
Question 5
A model identifying stolen credit cards has 99.5% accuracy. In a dataset of 100,000 transactions, 200 are fraudulent.
Approximately how many legitimate transactions might be incorrectly flagged as fraud?
0 - the model is 99.5% accurate
About 500 - 0.5% of 99,800 legitimate transactions
About 2 - it is 99.5% accurate
About 5,000 - fraud is rare so errors pile up

Exam-style practice

Write a structured answer
A hospital is considering using a machine learning classifier to help diagnose heart disease from patient data. Discuss the advantages and disadvantages of using a decision tree compared to a neural network for this application. [6 marks]
[6 marks]
0 words
Mark scheme - 6 marks
Decision tree advantage: interpretable - doctors can trace exactly why a diagnosis was made, making it easier to verify and trust. (1 mark)
Decision tree advantage: meets regulatory requirements - GDPR and healthcare regulations may require explainability for automated medical decisions. (1 mark)
Decision tree disadvantage: typically lower accuracy than neural networks, especially on complex patterns in data. (1 mark)
Neural network advantage: can find complex non-linear patterns in patient data that a decision tree would miss, potentially giving higher accuracy. (1 mark)
Neural network disadvantage: opaque / black box - cannot explain why it made a specific diagnosis, creating accountability problems in a medical context. (1 mark)
Evaluation point: in a medical context, interpretability may outweigh accuracy advantages, as doctors need to understand and verify AI decisions before acting on them. (1 mark)
Award marks for any well-reasoned advantage or disadvantage correctly attributed to the right model type. Evaluation mark requires a justified conclusion about which is more appropriate.
Printable Worksheets

Practice what you've learned

Three printable worksheets covering classifiers, decision trees, and false positives at three levels: Recall, Apply, and Exam-style.

Exam Practice
Lesson 3: Classification and Decision Making
GCSE-style written questions covering AI concepts. Work through them like an exam.
Start exam practice Download PDF exam
Lesson 3 - Teacher Resources
Classification and Decision Making
Teacher mode (all pages)
Shows examiner notes on the Exam Practice page
Suggested starter (5 min)
Ask the class: "You're designing an AI to screen luggage at an airport. Would you rather it missed 1 in 100 real threats, or flagged 1 in 50 innocent bags for a manual search?" Take a show of hands, then ask students to justify their position. This surfaces the precision/recall trade-off through intuition before the formal definitions appear.
Lesson objectives
1Explain how a decision tree classifies an input by working through a sequence of yes/no questions.
2Define precision and recall, and distinguish between a false positive and a false negative.
3Evaluate the trade-off between precision and recall in a given context, explaining which matters more and why.
Key vocabulary (board-ready)
Classification
Sorting an input into one of a set of predefined categories (e.g., spam/not spam, benign/malignant).
Decision tree
A flowchart-like model that classifies inputs by asking a series of yes/no questions based on features of the data.
False positive
When the model predicts a positive outcome (e.g., spam, tumour) but the actual answer is negative - a wrongful identification.
False negative
When the model predicts a negative outcome but the actual answer is positive - a missed case.
Precision
The proportion of positive predictions that were actually correct. High precision means few false positives.
Recall
The proportion of actual positive cases the model correctly identified. High recall means few false negatives.
Discussion prompts
A cancer screening AI has 99% precision but 60% recall. What does this mean for patients? Which error type is more acceptable in this context?
A facial recognition system at a 10,000-person public event has a 5% false positive rate. Ask students to calculate how many innocent people will be flagged. Is that number acceptable?
Why might a company's claim of "99% accurate AI" be misleading? What questions should you always ask about any published accuracy figure?
Common misconceptions
X"High accuracy means the model is reliable" - a model predicting 'no cancer' for every patient achieves 99% accuracy on a dataset where 1% have cancer, while being completely useless.
X"Precision and recall can both be maximised at the same time" - they trade off against each other. Increasing precision typically reduces recall and vice versa.
X"False positives and false negatives are equally serious" - context determines which is more harmful. In cancer screening, a missed diagnosis is far worse than an unnecessary follow-up test.
Exit ticket questions
Define false positive and give one real-world example.
[2 marks]
A cancer screening test correctly identifies 94% of all cancers. Which metric does this describe - precision or recall?
[1 mark]
Explain why a medical diagnosis AI should be designed to prioritise recall over precision.
[2 marks]
Homework idea
Choose one real AI classification system (image recognition, spam filtering, medical diagnosis, or credit scoring). Explain what a false positive and a false negative would mean in that specific context. Then explain which type of error is more acceptable and why. Aim for 150-200 words.
Classroom tips
The FP/FN sorting activity works best as a paired task before the whole-class precision/recall discussion.
Ground precision and recall in a concrete context (medical screening vs spam filtering) before introducing the formula. Students who see the formula first rarely understand it.
Timing: 25 minutes independent / 40 minutes with the facial recognition discussion.
Resources
AI Ethics Exam Practice Download student worksheet (PDF) Set as class homework (coming soon)