Lesson 4 of 6
Understanding AI Lesson 4 - Neural Networks
Lesson 4 of 6

Neural Networks:
the Basics

The architecture behind almost every modern AI system. This lesson explains how artificial neurons work, how they connect into layers, and how backpropagation enables learning.

GCSE and A-Level Free Live neural network demo

A two-year-old child can recognise their parent's face from any angle, in any lighting, after seeing it thousands of times. No one sat the child down and wrote rules about the distance between eyes, the shape of the nose, or the curvature of the jaw. The brain simply learned, from exposure and feedback, what faces look like.

In 2012, a team at the University of Toronto built a neural network called AlexNet. They fed it 1.2 million labelled images and let it learn. It cut the image recognition error rate almost in half overnight. Nobody told it what an edge was, or what a texture was, or what a face was. It worked it out from the data. The modern AI revolution started there.

Krizhevsky, Sutskever and Hinton. ImageNet Classification with Deep Convolutional Neural Networks. 2012.

Think: The brain has roughly 86 billion neurons and 100 trillion connections. AlexNet had about 650,000 neurons and 60 million parameters. It still worked remarkably well. Why do you think the architecture of the network matters more than the raw size?

From biological to artificial neurons

A biological neuron receives electrical signals from other neurons through its dendrites. If the total incoming signal is strong enough, it fires - sending a signal down its axon to other neurons. Everything in your brain is ultimately patterns of these firing events.

An artificial neuron works similarly. It takes several numeric inputs, multiplies each by a weight (how important that input is), adds them together, then passes the total through an activation function that decides whether and how strongly the neuron fires. The output becomes an input to the next layer of neurons.

Weights are everything
The weights determine what the network has "learned." When we say a neural network is training, we mean: it is adjusting its weights based on how wrong its predictions are. A neural network with a million parameters has a million weights, all being tuned simultaneously.

Neurons are organised into layers. The first layer receives the raw input (pixel values, word frequencies, sensor readings). The final layer produces the output (a classification, a probability, a generated token). Every layer in between is a hidden layer - it learns increasingly abstract representations of the data.

A network with many hidden layers is called a deep neural network. This is where the term deep learning comes from.

How does the network learn? Backpropagation.
After each prediction, the network calculates how wrong it was (the loss). It then works backwards through every layer, calculating how much each weight contributed to that error. Each weight is nudged in the direction that would have made the error smaller. This process, repeated millions of times, is backpropagation. The network never receives the "answer" directly - it just keeps failing and adjusting until it gets better.

Watch signals flow through a network

Click "Forward Pass" to watch a signal travel from the input layer, through the hidden layers, to the output. Each lit neuron represents one that has "fired." The brightness represents its activation strength.

Input layer
Hidden layers
Output layer
Inactive
Press Forward Pass to send a signal through the network. Watch how each layer activates in sequence.

Tune a single neuron

Every neuron in a neural network does the same calculation: multiply each input by its weight, add them all up, then squash the result through an activation function. Adjust the sliders below and watch the calculation update live at every step.

Input A (signal strength)
5
Weight A (importance - can be negative)
1.0
Input B (signal strength)
3
Weight B (importance - can be negative)
-1.0
Live calculation - step by step
Step 1 - A contribution:
5 x 1.0 = 5.0
Step 2 - B contribution:
3 x -1.0 = -3.0
Step 3 - Sum (weighted):
5.0 + (-3.0) = 2.0
Step 4 - Sigmoid activation:
sigmoid(2.0) = 0.88
Neuron output - how strongly is it firing?
0.88
The neuron is firing strongly. A positive weighted sum makes sigmoid produce a value close to 1.
Challenge: Try to get the neuron output below 0.1 (almost completely suppressed). What combination of weights achieves this?
Current output: 0.88 - not there yet

The sigmoid function compresses any number into the range 0 to 1. Very large sums produce outputs near 1 (firing). Very negative sums produce outputs near 0 (suppressed). This is how a neuron "decides" how strongly to pass a signal to the next layer - and during training, the weights are adjusted automatically to produce the right outputs.

One forward pass

A small neural network classifies emails as spam or not. Work through each calculation step manually - tracing the values from input all the way to the output decision. Complete each step in order to unlock the next.

The network - spam email classifier
x1 = 5
suspicious
word count
x2 = 2
external
link count
H1: w1=0.2, w2=0.5
H2: w1=-0.4, w2=0.3
H1 = ?
hidden
neuron 1
H2 = ?
hidden
neuron 2
Out: w1=0.5, w2=1.0
Output = ?
spam?

Activation: ReLU - any negative weighted sum becomes 0. If final output > 0.5 the email is classified as SPAM.

Step 1 - Weighted sum into H1
Multiply each input by its weight and add them together.(x1 x w1) + (x2 x w2) = (5 x 0.2) + (2 x 0.5) = ?
Step 2 - Apply ReLU to H1
Apply the ReLU activation function to the weighted sum you just calculated.ReLU(2.0) = max(0, 2.0) = ?
Step 3 - Weighted sum into H2
Same calculation for the second hidden neuron - different weights, different result.(x1 x w1) + (x2 x w2) = (5 x -0.4) + (2 x 0.3) = ?
Step 4 - Apply ReLU to H2
Apply ReLU to H2's weighted sum. Remember: negative values become 0.ReLU(-1.4) = max(0, -1.4) = ?
Step 5 - Output neuron weighted sum
Calculate the output neuron's weighted sum using the H1 and H2 outputs you found.(H1 x 0.5) + (H2 x 1.0) = ?

Questions worth thinking about

Question 1
Why do neural networks need hidden layers? What would happen if you connected the input directly to the output with no layers in between?
Key points: A direct input-to-output connection (no hidden layers) can only learn linear relationships - it is essentially a simple equation. Real-world problems are rarely linear. A photo of a cat is not linearly separable from a photo of a dog. Hidden layers allow the network to learn non-linear, hierarchical representations. The first hidden layer might learn to detect edges. The second might combine edges into shapes. The third might combine shapes into object parts. By stacking layers, the network builds increasingly abstract understanding. This is why depth (number of layers) is so important in modern AI.
Question 2
Training a large neural network requires enormous amounts of electricity. GPT-4 reportedly required tens of millions of dollars worth of compute. Is this a problem? Who should bear the cost?
Key points: Large-scale model training has a significant carbon footprint. Strubell et al. (2019) estimated that training a single large NLP model can produce as much CO2 as five cars over their lifetimes. This creates a barrier to entry: only well-funded organisations can train frontier models. It also raises questions about whether the benefits justify the environmental cost. Counter-arguments include: models are trained once and used billions of times (inference is cheap), and more efficient architectures continue to reduce training costs. The question of who bears the cost - investors, users via subscriptions, governments via regulation - is live policy debate in the UK and EU.
Question 3
Neural networks are often called "black boxes." What does this mean, and why does it matter for safety-critical applications?
Key points: A black box system is one where you can observe the inputs and outputs but not explain what happens in between. With millions of weights interacting, it is not practically possible to trace why a neural network made a specific decision. This matters in safety-critical domains: if an autonomous car's network makes a fatal decision, engineers may not be able to determine why. In medical diagnosis, a doctor cannot verify the reasoning. In criminal justice, a judge cannot explain an AI sentencing recommendation. The EU AI Act 2024 designates high-risk AI systems (in healthcare, policing, education) as requiring greater transparency and explainability, specifically because of this black-box problem.

What to remember

Core takeaways - Lesson 4
1
Artificial neurons multiply inputs by weights, sum the results, and pass them through an activation function. Many connected neurons form a network.
2
Networks have layers: input, hidden (one or more), and output. Hidden layers learn abstract representations of the data.
3
Weights are what the network learns. Training adjusts the weights to reduce prediction error. A model with millions of weights can capture enormously complex patterns.
4
Backpropagation is the algorithm that adjusts weights by working backwards through the network, calculating how much each weight contributed to the error.
5
Deep learning refers to neural networks with many hidden layers. More depth allows more complex patterns - but requires far more training data and compute.

Explore further

Wikipedia makes an excellent starting point for established computing concepts. For any specific fact or claim, scroll to the References section at the bottom of the article and go to the primary source directly.

In The News
Nobel Prize awarded for the mathematics behind neural networks (2024)
October 2024
In October 2024, the Nobel Prize in Physics was awarded to Geoffrey Hinton and John Hopfield for their foundational work on artificial neural networks - the very architecture you have been studying in this lesson. The Nobel Committee described Hopfield's work on associative memory networks (1982) and Hinton's development of the Boltzmann machine as discoveries that "laid the groundwork for the machine learning revolution." In the same month, the Nobel Prize in Chemistry was awarded to Demis Hassabis and John Jumper of Google DeepMind, alongside David Baker, for AlphaFold - a neural network that solved the protein-folding problem, predicting the 3D structure of virtually every known protein. AlphaFold's predictions, made freely available to researchers worldwide, had been called "a watershed moment for biology." The Nobel Committee described it as "almost like a miracle."
Discussion questions
AlphaFold solved in months a problem that had stumped structural biologists for 50 years. Does knowing that change how you think about what AI is and is not capable of? What problems remain that AI still cannot solve?
The Nobel Committee credited Hinton and Hopfield - the human researchers - not the neural networks they built. Should AI systems ever be credited or rewarded for discoveries they make? What would this even mean in practice?
DeepMind released AlphaFold's protein structure database for free. OpenAI and other companies keep their most powerful models behind paywalls or APIs. What are the arguments for and against open-sourcing powerful AI research?
Read more: AlphaFold (Wikipedia)    Geoffrey Hinton (Wikipedia)

Check your understanding

5 Questions
Answer all five, then submit for instant feedback
Question 1
In a neural network, what is a weight?
The number of neurons in the network
A numeric value that determines how strongly one neuron influences another
The size of the training dataset
The output value of the final layer
Question 2
What is the purpose of an activation function in an artificial neuron?
To store the training data inside the neuron
To decide whether and how strongly the neuron fires based on its total input
To connect the network to the internet
To count the number of training examples
Question 3
What does the term "deep learning" refer to?
A neural network trained on a very large dataset
A neural network with many hidden layers, allowing it to learn complex, hierarchical representations
A type of machine learning that does not require training data
A slower, more careful version of standard machine learning
Question 4
What is backpropagation?
The process of sending data from the output layer back to the input layer to be reused
An algorithm that adjusts weights by calculating how much each contributed to the prediction error and nudging them to reduce it
A method of removing neurons that are not contributing to the network
The process of evaluating the model on test data
Question 5
A neural network is trained to classify handwritten digits (0-9). After training, it achieves 99% accuracy on test data. An engineer then examines the network's internal weights to understand why it classified a 7 as a 1.
Why is this examination likely to be very difficult?
The network does not store its training data internally
Neural networks are black boxes - with millions of interacting weights, it is not practically possible to trace why a specific decision was made
The weights are encrypted for security reasons
The network needs more training before its decisions can be explained

Exam-style practice

Write a structured answer
Describe how a neural network learns from training data. Your answer should refer to weights, error, and backpropagation. [5 marks]
[5 marks]
0 words
Mark scheme - 5 marks
The network makes a prediction by passing input data forward through the layers (forward pass), with each neuron multiplying inputs by weights and applying an activation function. (1 mark)
The prediction is compared to the correct label (from the training data) to calculate the error / loss - how wrong the prediction was. (1 mark)
Backpropagation works backwards through the network, calculating how much each weight contributed to the error. (1 mark)
Each weight is adjusted (nudged) slightly in the direction that would reduce the error. This process is called gradient descent. (1 mark)
This process (forward pass, calculate error, backpropagate, adjust weights) is repeated many thousands of times across all training examples until the network's predictions are consistently accurate. (1 mark)
Accept any accurate description of the learning loop. All five marks require the answer to progress logically from prediction, through error calculation, to weight adjustment.
Printable Worksheets

Practice what you've learned

Three printable worksheets covering neurons, layers, weights, and the training process at three levels: Recall, Apply, and Exam-style.

Exam Practice
Lesson 4: Neural Networks - the Basics
GCSE-style written questions covering AI concepts. Work through them like an exam.
Start exam practice Download PDF exam
Lesson 4 - Teacher Resources
Neural Networks - the Basics
Teacher mode (all pages)
Shows examiner notes on the Exam Practice page
Suggested starter (5 min)
Ask: "A child sees 1,000 photos labelled 'cat' and 1,000 labelled 'dog'. After that, they can identify new ones they've never seen. How?" Take brief answers. Then: "A neural network does something similar - using numbers, weights, and millions of examples. Today you'll find out exactly how." This frames the lesson through human intuition before the technical content.
Lesson objectives
1Describe what an artificial neuron does: receives weighted inputs, sums them, applies a threshold, and produces an output.
2Explain forward propagation and how an input passes through a network layer by layer to produce a prediction.
3Describe how training adjusts the network's weights to reduce the error between predictions and correct answers.
Key vocabulary (board-ready)
Artificial neuron
A computational unit that receives weighted inputs, sums them, applies an activation function, and produces an output.
Weight
A numerical value controlling the strength of a connection between two neurons; adjusted during training to reduce error.
Activation function
A mathematical function applied to a neuron's weighted input sum, determining whether and how strongly the neuron fires.
Forward propagation
The process of passing an input through a neural network layer by layer to produce a prediction.
Backpropagation
The algorithm used during training to calculate each weight's contribution to the prediction error and adjust weights to reduce it.
Deep learning
A machine learning approach using neural networks with multiple hidden layers to learn increasingly abstract representations of data.
Discussion prompts
A neural network is 94% accurate at detecting tumours in scans, but nobody can explain why it makes any particular decision. Is this acceptable for clinical use? What would need to change before you would trust it with a diagnosis?
"More layers always means a better neural network." Is this true? When might adding more layers make performance worse rather than better?
Human brains also learn by strengthening connections between neurons. In what ways is this similar to a neural network - and in what ways is it fundamentally different?
Common misconceptions
X"Neural networks work like the human brain" - they are loosely inspired by biological neurons but bear little resemblance to actual brain function. The comparison is a metaphor, not a description.
X"More layers always means better performance" - deeper networks require more data and training time, and can perform worse on simpler problems.
X"The network knows why it made a decision" - neural networks are black boxes. The millions of weight interactions make it impossible to trace why any specific decision was made.
Exit ticket questions
What is the role of weights in a neural network?
[1 mark]
Explain what happens during forward propagation in a neural network.
[2 marks]
Why is a trained neural network sometimes described as a 'black box'?
[2 marks]
Homework idea
Describe, in your own words, how a neural network learns to classify emails as spam or not spam. Your answer must mention: training data, weights, error, and backpropagation. Aim for 120-150 words. Write it as if explaining to a Year 9 student who has not seen this lesson - do not copy the lesson text.
Classroom tips
The neuron tuner activity works best if students attempt the calculation manually first, before using the interactive step-by-step reveal.
The suppression challenge extension is well-suited for students who finish early and are confident with the basic mechanism.
Timing: 25 minutes independent / 40 minutes with the "black box" discussion.
Resources
AI Ethics Exam Practice Download student worksheet (PDF) Set as class homework (coming soon)