Lesson 3
Computer Systems Lesson 3
Computer Systems Lesson 3 of 5

Why is your CPU
faster than mine?

Clock speed, cores, cache size and pipelining. Four factors that determine how quickly a processor can work - and each involves a genuine engineering trade-off.

40-50 minutes Core content Core GCSE CS content Interactive comparisons

Compare two processors - an Intel Core i3 at 3.6 GHz and an Intel Core i9 at 5.0 GHz. The i9 has a higher clock speed. But the i9 also costs four times as much. Is it four times as fast? And why does an Apple M4 at 4.4 GHz often outperform an Intel running at 5.0 GHz?

The answer: Clock speed is just one of four performance factors. Understanding all four explains why processor benchmarks are more complicated than a single GHz number.

What actually affects CPU performance?

Click each factor to explore it in detail.
⏱️
Clock Speed
How many cycles per second the CPU can complete
🔲
Number of Cores
How many independent processing units the chip contains
Cache Size
How much fast-access memory sits close to the CPU
🔄
Pipelining
How the CPU overlaps multiple FDE cycles simultaneously

Clock Speed

The clock speed (measured in GHz) determines how many FDE cycles the CPU can complete per second. 1 GHz = 1 billion cycles per second. A 4 GHz CPU can theoretically process 4 billion instructions per second.

The limit: Increasing clock speed produces heat. Beyond around 5-6 GHz, modern chips generate so much heat they become unreliable unless cooled with extreme methods. This is why clock speeds have not increased dramatically since 2005 - engineers found other ways to improve performance instead.

Number of Cores

Instead of making one core faster, chip designers added more cores - each an independent processor that can fetch, decode and execute its own instructions simultaneously.

The catch: Multiple cores only help if the software is written to use them (multi-threaded). A single-threaded program runs on one core only - adding more cores gives it no benefit at all. Video editing, 3D rendering and scientific simulations benefit enormously from many cores. Opening a spreadsheet on a single tab barely uses more than one.

Cache Size

Fetching data from RAM is slow relative to CPU speed. Cache stores recently used data much closer to the CPU core, reducing how often the processor has to wait. A larger cache means more data can be held nearby - fewer cache misses, fewer stalls.

The limit: Cache is extremely expensive to manufacture per byte - far more than RAM. Large caches also increase chip size and power consumption. The design becomes a careful balance between hit rate, cost, size and heat.

Pipelining

Without pipelining, the CPU finishes all three FDE stages of instruction 1 before starting instruction 2. With pipelining, while instruction 1 is being decoded, instruction 2 is already being fetched. Each stage works on a different instruction simultaneously - like an assembly line in a factory.

The complication: If an instruction's result is needed by the next instruction (a data hazard), the pipeline must stall and wait. Modern CPUs use sophisticated techniques including branch prediction and out-of-order execution to minimise these stalls.

Clock speed in numbers

3.6 GHz
That is 3,600,000,000 clock cycles per second
1.0 GHz2.5 GHz3.5 GHz4.5 GHz5.0 GHz
Slower Typical mid-range laptop Faster

Cache hits, cache misses - and why they matter

When the CPU needs a piece of data, it does not go straight to RAM. It checks each cache level in turn, starting with the fastest. If the data is there, it is a cache hit and the CPU gets the data almost immediately. If not, it is a cache miss - the CPU must go to the next level, which takes longer.
Memory hierarchy - speed vs size
Each row is where the CPU looks next on a miss. Click a row to see what a hit there costs.
L1 L1 Cache 32-64 KB ~1-4 cycles HIT
L2 L2 Cache 256 KB-1 MB ~10-20 cycles MISS
L3 L3 Cache 8-64 MB ~40-50 cycles MISS
RAM Main Memory (RAM) 8-128 GB ~200+ cycles MISS
Simulate a memory access - where is the data?
Click a scenario above to simulate a memory access.
A programme that repeatedly uses the same variables keeps them in L1 cache - each access costs just 1-4 cycles. A programme that constantly reads from large arrays or jumps around in memory causes many L3 misses or even RAM accesses. At 200+ cycles each, those misses add up fast. This is why algorithms that access memory in predictable patterns (like iterating through an array) are faster than those that jump around unpredictably.
Exam focus

When explaining how cache improves performance: say "reduces the number of slow accesses to main memory" and "stores recently/frequently used data closer to the CPU." A cache miss means the CPU must wait while data is fetched from the next level. A larger cache reduces the frequency of cache misses.

More cores: when it helps and when it does not

A multi-core processor contains multiple complete processing units (cores) on a single chip. Each core has its own ALU, CU, registers and L1/L2 cache. All cores share L3 cache and main memory. This allows genuinely parallel execution - different cores work on different tasks at the same time.
1 Core - 8 tasks
Estimated time: -
4 Cores - 8 tasks
Estimated time: -
The simulation above assumes all 8 tasks are independent (parallelisable). In reality, many programs have tasks that depend on each other's results. If Task B needs the output from Task A, it must wait - even if other cores are free. This is why the theoretical speedup from adding cores is rarely achieved in practice.
Single-threaded vs multi-threaded: real examples
Uses multiple cores well
  • Video rendering / encoding
  • 3D modelling and animation
  • Scientific simulations
  • Compiling large codebases
  • Running virtual machines
Mostly single-threaded
  • Simple web browsing
  • Spreadsheet calculation
  • Many older games
  • Sequential data processing
  • Most command-line scripts
Exam focus

The key phrase is: "Multiple cores allow multiple instruction streams to execute simultaneously, which improves performance for multi-threaded applications. Single-threaded programs cannot benefit from additional cores as they can only use one core at a time."

How pipelining overlaps instructions

The table below shows two approaches for executing 4 instructions. Without pipelining, each instruction must complete all three stages before the next begins. With pipelining, stages overlap - dramatically increasing throughput.

Without pipelining - 12 clock cycles for 4 instructions:

Instruction Cycle 1Cycle 2Cycle 3 Cycle 4Cycle 5Cycle 6 Cycle 7Cycle 8Cycle 9 Cycle 10Cycle 11Cycle 12
I1 FDE --- --- ---
I2 --- FDE --- ---
I3 --- --- FDE ---
I4 --- --- --- FDE

With pipelining - 6 clock cycles for 4 instructions:

Instruction Cycle 1Cycle 2Cycle 3 Cycle 4Cycle 5Cycle 6
I1FDE---
I2-FDE--
I3--FDE-
I4---FDE
F = Fetch D = Decode E = Execute

Live simulation: a real program through the pipeline

The program below adds two numbers and stores the result. Step through it cycle by cycle to see exactly which stage each instruction occupies - and what the CPU is doing at each moment.
Pipeline Simulator
Program: load 5, add 3, store result, halt
Cycle - of 6
PC
-
CIR
-
ACC
-
MEM[8]
-
Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6
Press Start to begin. Each click advances one clock cycle.
F = Fetch (get instruction from memory) D = Decode (work out what it means) E = Execute (carry out the operation)
Notice that after cycle 3, all three pipeline stages are busy simultaneously. This is the steady state of a pipelined CPU - three instructions at different stages, all being processed at once. Without pipelining, only one stage would be active at any given cycle.
What about data hazards?
A data hazard occurs when one instruction needs the result of the previous one before it has finished executing. For example, if I3 needed the value computed by I2 but I2 is still in its Execute stage when I3 reaches Decode, the pipeline must stall - inserting empty cycles (called "bubbles") to wait. Modern CPUs use out-of-order execution to rearrange independent instructions and keep the pipeline full as often as possible.
Exam focus

Pipelining increases throughput (instructions completed per second) but does not reduce the time for any single instruction. A data hazard can cause a pipeline stall. For the exam: without pipelining, N instructions take 3N cycles. With pipelining, N instructions take N+2 cycles (2 cycles to fill the pipeline initially).

How the factors interact

ProcessorCoresClock SpeedL3 CacheBest for
Intel Core i3 (budget laptop) 43.6 GHz12 MB Web browsing, office work, light multitasking
Intel Core i7 (mid-range) 164.7 GHz24 MB Video editing, software development, gaming
Intel Core i9 (high-end) 245.8 GHz36 MB 3D rendering, scientific computing, high-end gaming
Apple M4 (ARM architecture) 104.4 GHz16 MB Efficiency-focused: performance per watt, longer battery life
The Apple M4 runs at a lower clock speed than the Intel i9 but often matches or beats it in real-world tasks. This is because architecture matters too - ARM-based chips execute more work per clock cycle than x86 chips. This is called IPC (Instructions Per Clock), and it is why raw GHz is not the full picture.
Lesson 3 Quiz
5 questions on CPU performance
Question 1 of 5
A CPU runs at 4 GHz. How many clock cycles does it complete per second?
Question 2 of 5
Why does adding more cores NOT always improve the performance of every program?
Question 3 of 5
What is the main benefit of larger cache memory in a CPU?
Question 4 of 5
In pipelining, what happens to the next instruction while the current instruction is being decoded?
Question 5 of 5
Why have CPU clock speeds not increased dramatically beyond ~5 GHz despite decades of improvement?
out of 5 correct
Think deeper

A company advertises a new laptop as having "2x more cores" than its predecessor, but benchmarks show it is only 30% faster in typical use. What might explain this gap between the marketing claim and the real-world result?

Several factors limit the real-world gain from doubling cores: (1) Many everyday applications are single-threaded or lightly multi-threaded and cannot use all cores simultaneously. (2) Amdahl's Law - the theoretical maximum speedup is limited by the proportion of a task that can be parallelised. (3) Other bottlenecks such as RAM bandwidth, storage speed or bus width may prevent the cores from being fully utilised. (4) Clock speed, cache size or architecture may differ between the two models, partially offsetting the core count advantage.
Revision
Computer Systems Flashcards
27 key terms across all 5 lessons. Filter by topic, flip to reveal, mark as known.
Open flashcards
Lesson 3 - Teacher Resources
Why is your CPU faster than mine?
Teacher mode (all pages)
Shows examiner notes on the Exam Practice page
Suggested starter (5 min)
Show two laptop spec sheets side by side: one from 2015 (dual-core, 2.5 GHz, 4 MB cache) and one from today (12-core, 4.8 GHz, 24 MB cache). Ask: "Is the new laptop 12x faster? 2x? Exactly how much faster would it be at sending one email? At rendering a 4K film?" Take answers. Students quickly realise clock speed x cores is not a simple multiplier. This motivates every factor covered in the lesson.
Lesson objectives
1Explain how clock speed (in GHz) determines the number of FDE cycles per second, and state why doubling the clock speed does not always double real-world performance.
2Describe what CPU cores are and explain why multi-core processors improve throughput for parallelisable tasks but not for inherently serial programs.
3Explain how cache memory (L1, L2, L3) reduces average memory access time, and describe what a cache hit and a cache miss mean in practice.
4Describe pipelining and explain how overlapping FDE stages increases throughput - and identify at least one situation where it does not help (pipeline hazards).
Key vocabulary (board-ready)
Clock speed
The number of cycles the CPU performs per second, measured in GHz (gigahertz = 1 billion cycles/second). A 4 GHz processor performs 4 billion clock cycles per second.
CPU core
An independent processing unit within a CPU chip, each capable of executing its own FDE cycle simultaneously. More cores improves parallel workloads, not the speed of a single task.
Cache (L1/L2/L3)
Small, fast memory built into or close to the CPU. L1 is fastest and smallest (32-64 KB per core); L3 is largest and slowest (8-32 MB, shared). Frequently used data is stored here to reduce RAM access time.
Cache hit
When the CPU requests data and finds it already in cache. No RAM access needed - typically under 1-4 ns for L1.
Cache miss
When data the CPU needs is not in cache and must be fetched from RAM (50-100 ns) or SSD (microseconds). The miss penalty is the added delay per missing access.
Pipelining
A technique where the CPU begins fetching the next instruction while decoding the current one, and begins decoding while executing another. Like an assembly line - multiple instructions at different stages simultaneously.
Pipeline hazard
A situation that prevents the pipeline running efficiently. Data hazards: one stage needs output from a stage still in progress. Control hazards: a branch makes the next fetch uncertain.
Suggested lesson plan
0-5 min: Starter: two spec sheets. Students predict relative speed. Keep predictions on the board to revisit at the end.
5-15 min: Clock speed: what a Hz is, FDE cycles per second, thermal limits. Students calculate cycles per second from GHz.
15-25 min: Multi-core: parallelism vs serial code. Amdahl's Law concept for higher. Concrete example: rendering frames (parallelisable) vs sequential tax calculation (serial).
25-40 min: Cache hierarchy: L1/L2/L3, hit/miss rates and penalties. Interactive cache sim - students observe hit/miss patterns as memory access patterns change.
40-52 min: Pipelining: overlapping FDE stages, throughput vs latency, pipeline hazards and branch misprediction.
52-60 min: Revisit starter predictions. Who was right? Exit tickets.
Discussion prompts
Intel's top desktop CPU in 2003 ran at 3.0 GHz. Today's CPUs run at 5.0+ GHz - under 2x faster in clock speed. Yet a modern CPU is 50-100x faster on real workloads. Where does the extra performance come from if not clock speed?
A 12-core CPU runs a spreadsheet calculation in the same time as a 1-core CPU - but renders a video 10x faster. What is different about these two tasks?
Your browser has dozens of tabs open, each with JavaScript running. Is this parallelism at the core level, the thread level, or the process level? Does it matter how many cores your CPU has?
Cache miss rates of 5% sound small. But if 5% of your accesses take 100x longer than cache hits, what is the effect on average performance? Calculate together: 95 percent hits at 1 ns plus 5 percent misses at 100 ns.
Common misconceptions
X"More GHz = faster computer" - clock speed is one factor among many. A 4 GHz CPU with a small cache may be slower than a 3.5 GHz CPU with a large L3 cache on real workloads.
X"More cores always means faster" - programs must be written to use multiple cores. Sequential code sees no improvement from additional cores beyond one.
X"Cache is just extra RAM" - cache is on-chip (L1/L2 per core) or very close to the chip (L3), operates at near-CPU speed, and is automatically managed by hardware. Qualitatively different from system RAM.
X"Pipelining doubles performance" - pipelining improves throughput over a long program but does not reduce the latency of any single instruction. Hazards break the pipeline; real gains are less than theoretical.
Exit ticket questions
A CPU runs at 3 GHz. How many FDE cycles does it complete per second?
[1 mark - 3,000,000,000 / 3 billion]
Explain why doubling the number of CPU cores does not always double program speed.
[2 marks - some code is serial and cannot be parallelised / only the parallelisable portion benefits from extra cores]
What is the difference between a cache hit and a cache miss?
[2 marks - hit: data found in cache, fast access / miss: not in cache, must fetch from RAM, significantly slower]
Describe how pipelining increases CPU throughput.
[2 marks - overlaps fetch/decode/execute of multiple instructions simultaneously / like an assembly line - each stage processes a different instruction at the same time]
Homework idea
Research and compare two real CPUs: the Intel Core i5-13600K and the AMD Ryzen 9 7950X. For each, find: clock speed (base and boost), number of cores, L3 cache size, and approximate retail price. Write two paragraphs: (1) which is better for a video editor and why, (2) which is better for a programmer compiling code one file at a time and why. Justify both answers using factors from this lesson.
Classroom tips
The cache simulation is most effective when you contrast a sequential access pattern (high hit rate) against a random access pattern (high miss rate). Ask students to predict which will be faster before running it.
Amdahl's Law does not need to be taught as a formula at GCSE. The concept - the serial part limits total speedup regardless of core count - is sufficient and very assessable.
Pipeline hazards at GCSE: students only need to know that branches cause problems because the CPU does not know which instruction to fetch next. The phrase "branch misprediction" is enough.
Revisiting the starter spec sheet predictions at the end of the lesson is very effective. Students who changed their mind should explain what they now know that changed their view.