Why is your CPU
faster than mine?
Clock speed, cores, cache size and pipelining. Four factors that determine how quickly a processor can work - and each involves a genuine engineering trade-off.
Compare two processors - an Intel Core i3 at 3.6 GHz and an Intel Core i9 at 5.0 GHz. The i9 has a higher clock speed. But the i9 also costs four times as much. Is it four times as fast? And why does an Apple M4 at 4.4 GHz often outperform an Intel running at 5.0 GHz?
What actually affects CPU performance?
Clock Speed
The clock speed (measured in GHz) determines how many FDE cycles the CPU can complete per second. 1 GHz = 1 billion cycles per second. A 4 GHz CPU can theoretically process 4 billion instructions per second.
The limit: Increasing clock speed produces heat. Beyond around 5-6 GHz, modern chips generate so much heat they become unreliable unless cooled with extreme methods. This is why clock speeds have not increased dramatically since 2005 - engineers found other ways to improve performance instead.
Number of Cores
Instead of making one core faster, chip designers added more cores - each an independent processor that can fetch, decode and execute its own instructions simultaneously.
The catch: Multiple cores only help if the software is written to use them (multi-threaded). A single-threaded program runs on one core only - adding more cores gives it no benefit at all. Video editing, 3D rendering and scientific simulations benefit enormously from many cores. Opening a spreadsheet on a single tab barely uses more than one.
Cache Size
Fetching data from RAM is slow relative to CPU speed. Cache stores recently used data much closer to the CPU core, reducing how often the processor has to wait. A larger cache means more data can be held nearby - fewer cache misses, fewer stalls.
The limit: Cache is extremely expensive to manufacture per byte - far more than RAM. Large caches also increase chip size and power consumption. The design becomes a careful balance between hit rate, cost, size and heat.
Pipelining
Without pipelining, the CPU finishes all three FDE stages of instruction 1 before starting instruction 2. With pipelining, while instruction 1 is being decoded, instruction 2 is already being fetched. Each stage works on a different instruction simultaneously - like an assembly line in a factory.
The complication: If an instruction's result is needed by the next instruction (a data hazard), the pipeline must stall and wait. Modern CPUs use sophisticated techniques including branch prediction and out-of-order execution to minimise these stalls.
Clock speed in numbers
Cache hits, cache misses - and why they matter
When explaining how cache improves performance: say "reduces the number of slow accesses to main memory" and "stores recently/frequently used data closer to the CPU." A cache miss means the CPU must wait while data is fetched from the next level. A larger cache reduces the frequency of cache misses.
More cores: when it helps and when it does not
- Video rendering / encoding
- 3D modelling and animation
- Scientific simulations
- Compiling large codebases
- Running virtual machines
- Simple web browsing
- Spreadsheet calculation
- Many older games
- Sequential data processing
- Most command-line scripts
The key phrase is: "Multiple cores allow multiple instruction streams to execute simultaneously, which improves performance for multi-threaded applications. Single-threaded programs cannot benefit from additional cores as they can only use one core at a time."
How pipelining overlaps instructions
Without pipelining - 12 clock cycles for 4 instructions:
| Instruction | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 | Cycle 6 | Cycle 7 | Cycle 8 | Cycle 9 | Cycle 10 | Cycle 11 | Cycle 12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| I1 | F | D | E | - | - | - | - | - | - | - | - | - |
| I2 | - | - | - | F | D | E | - | - | - | - | - | - |
| I3 | - | - | - | - | - | - | F | D | E | - | - | - |
| I4 | - | - | - | - | - | - | - | - | - | F | D | E |
With pipelining - 6 clock cycles for 4 instructions:
| Instruction | Cycle 1 | Cycle 2 | Cycle 3 | Cycle 4 | Cycle 5 | Cycle 6 |
|---|---|---|---|---|---|---|
| I1 | F | D | E | - | - | - |
| I2 | - | F | D | E | - | - |
| I3 | - | - | F | D | E | - |
| I4 | - | - | - | F | D | E |
Live simulation: a real program through the pipeline
Pipelining increases throughput (instructions completed per second) but does not reduce the time for any single instruction. A data hazard can cause a pipeline stall. For the exam: without pipelining, N instructions take 3N cycles. With pipelining, N instructions take N+2 cycles (2 cycles to fill the pipeline initially).
How the factors interact
| Processor | Cores | Clock Speed | L3 Cache | Best for |
|---|---|---|---|---|
| Intel Core i3 (budget laptop) | 4 | 3.6 GHz | 12 MB | Web browsing, office work, light multitasking |
| Intel Core i7 (mid-range) | 16 | 4.7 GHz | 24 MB | Video editing, software development, gaming |
| Intel Core i9 (high-end) | 24 | 5.8 GHz | 36 MB | 3D rendering, scientific computing, high-end gaming |
| Apple M4 (ARM architecture) | 10 | 4.4 GHz | 16 MB | Efficiency-focused: performance per watt, longer battery life |
A company advertises a new laptop as having "2x more cores" than its predecessor, but benchmarks show it is only 30% faster in typical use. What might explain this gap between the marketing claim and the real-world result?
Practice what you've learned
Three worksheets on CPU performance factors at three levels: Recall, Apply, and Exam-style.