Lesson 3 of 6 · Data & Binary

Character Encoding

How every letter, digit and emoji you type is stored as a binary number. ASCII, Unicode, and the difference between them.

Explain how characters are represented as binary numbers

Describe ASCII including its 7-bit structure and limitations

State one benefit and one drawback of Unicode compared to ASCII

Encode text using an ASCII table

How text becomes numbers

A computer stores everything as binary. Numbers translate directly. But how does it store the letter A? Or the symbol #? Or the word "Hello"?

The answer is a character encoding: an agreed table that assigns a unique number to every character. Every device that uses the same encoding will interpret the same binary pattern as the same character. Without agreement, one computer's "A" would be another computer's "5".

The personal connection: Type your first name into the encoder below. Every character you type has been stored as a unique binary number since the moment you first typed it anywhere digitally. This is not a special mode -- it is how every text file, every message, every webpage works.

ASCII

ASCII (American Standard Code for Information Interchange) was standardised in 1963. It uses 7 bits per character, giving 2⁷ = 128 possible values (0 to 127).

Key values to remember: capital A = 65, capital B = 66, ..., capital Z = 90. The pattern continues: lowercase a = 97, b = 98. Digit 0 = 48.

The gap trick: A = 65, a = 97. The difference is exactly 32. To convert an uppercase letter to lowercase in ASCII, add 32. To go back, subtract 32. The difference between '0' (48) and 'A' (65) is 17 -- which is why you cannot just subtract '0' to get the numeric value of digit '9'.

01000001

01000010

01001101

01011010

01100001

122

01111010

00110000

00111001

00100001

space

00100000

128 characters is enough for English: all letters (upper and lower), digits, punctuation, and control characters like newline and tab. But it was designed for American English in the 1960s -- and the world is considerably larger than that.

      ASCII limitation: 128 characters cannot cover accented letters (e, u, o), non-Latin scripts (Chinese, Arabic, Hindi), currency symbols outside $, or any emoji. Systems using only ASCII cannot correctly process text in most of the world's languages.
    

Unicode

Unicode was developed to solve ASCII's limitation. It assigns a unique code point to every character in every writing system on earth -- currently over 149,000 characters including emoji.

Unicode is not one fixed-size encoding. It has several implementations:

Encoding	Bits per character	Range	Compatibility
UTF-8	8 to 32 (variable)	All Unicode	ASCII compatible, 1 byte for basic English
UTF-16	16 or 32 (variable)	All Unicode	Used internally in many systems
UTF-32	32 (fixed)	All Unicode	4x larger files for ASCII text

UTF-8 is the dominant encoding on the web because it is backwards compatible with ASCII (the first 128 code points are identical) and uses only 1 byte for standard English text.

Property	ASCII	Unicode (UTF-8)
Bits per character	7 (stored as 8)	8 to 32
Characters supported	128	Over 149,000
Language support	English only	All major languages + emoji
File size (English text)	Smaller	Same (1 byte/char for ASCII range)
File size (non-English)	Cannot store	2-4 bytes per character

Exam phrasing: Benefit of Unicode: "can represent characters from a wider range of languages / more characters". Drawback: "files use more storage / uses more bits per character than ASCII". Both sides must be specific.

Character Encoder

Interactive

Type any text. See each character's ASCII code and 8-bit binary value instantly. Try your first name.

Exam Focus

ASCII questions often give you a partial table (e.g. K=01001011, L=01001100, M=01001101, N=01001110) and ask you to encode a word like "POP". You must work out P's code from the pattern (P=01010000), not from memory.
Benefit and drawback questions require one specific point each. "Unicode is better" scores 0. "Unicode supports more characters / languages" scores the mark.
The drawback of Unicode is file size or processing overhead -- not "it is more complicated". Be precise.
ASCII stores characters in 7 bits but typically uses an 8th bit as padding, so 1 byte per character. Know both facts.

Check your understanding

1. Using the pattern K=01001011, L=01001100, M=01001101, N=01001110, what is the binary code for the letter P?

01001111

01010000

01001110

01010010

K=75, L=76, M=77, N=78. The pattern continues: O=79 (01001111), P=80 (01010000). Each letter increments the binary code by 1.

2. How many different characters can ASCII represent?

256 (2⁸)

128 (2⁷)

64 (2⁶)

512 (2⁹)

ASCII uses 7 bits per character. 2⁷ = 128 possible values (0 to 127).

3. State one benefit of using Unicode instead of ASCII.

Unicode is simpler to implement than ASCII

Unicode uses fewer bits per character

Unicode can represent characters from a much wider range of languages

Unicode files are always smaller

The key benefit is character range: over 149,000 characters covering all major world scripts, plus emoji. ASCII only supports 128 characters (primarily English).

4. State one drawback of using Unicode instead of ASCII.

Unicode cannot represent English characters

Unicode uses more bits per character, resulting in larger file sizes for some text

Unicode is not compatible with any existing systems

Unicode cannot represent numbers

The drawback is storage: Unicode (especially UTF-16/UTF-32) uses more bits per character than ASCII's 7/8 bits, so files are larger. For English-only text, UTF-8 is the same size as ASCII.

5. A text file contains 500 characters, all within the standard ASCII range. How many bytes of storage does it require in ASCII?

350 bytes (500 × 7 bits = 3500 bits = 437.5 bytes)

500 bytes (500 × 8 bits = 4000 bits = 500 bytes)

1000 bytes (500 × 16 bits)

2000 bytes (500 × 32 bits)

ASCII characters are stored as 8 bits (1 byte) each, even though only 7 bits are needed. 500 characters × 1 byte = 500 bytes.

Think Deeper

Unicode includes over 3,500 emoji (such as face with tears of joy, fire, heart). Why does encoding emoji require more than 1 byte in UTF-8, and what does this mean for storage of emoji-heavy text?

Emoji code points in Unicode are typically above U+007F (127), often in the range U+1F600 and above. UTF-8 encodes these using 3 or 4 bytes rather than 1. A message of 100 emoji uses 300-400 bytes rather than 100 bytes. For social media platforms storing billions of messages with heavy emoji use, this storage cost is significant -- one reason platforms often store text in optimised binary formats rather than raw UTF-8.

Before Unicode, different countries used different 8-bit extensions of ASCII (ISO-8859-1 for Western European, Windows-1251 for Cyrillic, etc.). What problems did this cause for international communication? How did Unicode solve them?

Each regional encoding used different values for the upper 128 code points (128-255). A document created with one encoding would display garbled characters on a system using a different encoding -- this was called "mojibake". Email and webpages from different countries were often unreadable. Unicode provided a single globally agreed set of code points so the same number always means the same character, regardless of where the text was created. UTF-8 replaced dozens of incompatible standards and is now used by over 98% of websites.

Next Lesson

Representing Images

Continue

Printable Worksheets

Practice what you've learned

Three printable worksheets covering character encoding at three levels: Recall, Apply, and Exam-style.

Recall

Worksheet 1

Key term matching + ASCII table completion + True/False • 18 marks

Apply

Worksheet 2

ASCII encoding + file size calculations + comparison table • 17 marks

Exam-style

Worksheet 3

Extended encoding, file size and Unicode evaluation questions • 20 marks

Exam Practice

Lesson 3: Character Encoding

5 MCQ with instant feedback + 3 written questions with mark schemes.

Start exam practice Download PDF exam

Teacher Panel

Lesson 3 -- Character Encoding

Lesson Objectives

Explain that character encoding assigns a unique binary number to each character

State that ASCII uses 7 bits (128 characters) with key values: A=65, a=97, 0=48

State the limitation of ASCII (English only, cannot represent global languages)

State one benefit (more characters/languages) and one drawback (larger files) of Unicode

Encode a short word using a given ASCII table in an exam context

Timing Guide

0-5 min: "Encode your name" activity using the tool -- everyone starts engaged

5-12 min: ASCII structure -- 7 bits, 128 chars, key values (A=65, a=97, 0=48)

12-18 min: ASCII limitations -- why 128 is not enough for global communication

18-25 min: Unicode -- UTF-8, variable width, benefit vs drawback

25-30 min: Exam question practice -- given ASCII table, encode a word

Common Misconceptions

"ASCII uses 8 bits" -- ASCII is defined as 7-bit (128 characters). It is stored in 8 bits with a leading zero, but is a 7-bit system.

"Unicode replaces binary" -- Unicode IS a binary encoding. Each code point is still stored as bits.

"Unicode files are always bigger" -- UTF-8 Unicode files for English text are exactly the same size as ASCII files (1 byte per character).

Students often confuse character code (the number) with the binary representation. The number 65 and the binary 01000001 are the same value -- not two separate things.

Marking Guidance

ASCII encoding from a table (2 marks): The question will give a partial table and ask you to encode a word. Students must use the arithmetic pattern (each letter increments by 1) not just memory. Show the code for each letter separately.

Benefit/drawback questions (1+1 marks): "Unicode is better" = 0. "Unicode can represent characters from more languages" = 1 mark. "Unicode uses more storage per character" = 1 mark. Specificity is everything.

The "Encode Your Name" Activity

5 minutes at the start. Students type their first name into the encoder. Discussion: is your name the same in binary in every country? (Yes, for ASCII names -- this is the whole point of agreed standards.) What about students with non-ASCII names? (They cannot be stored in ASCII -- this is the problem Unicode solves.)

This personal hook makes the abstract concept of encoding immediately concrete and memorable.

Exit Tickets

Given K=01001011, encode the word "MOM" using the ASCII pattern. [2 marks]

State one benefit and one drawback of Unicode compared to ASCII. [2 marks]

How many bits does ASCII use per character? How many characters can it represent? [2 marks]

Differentiation

Grade 4 Encode from a given ASCII table. State ASCII uses 7 bits. One benefit and one drawback of Unicode.

Grade 7 All of Grade 4 plus: calculate file size in bytes from character count. Explain why ASCII fails for global languages.

Grade 9 Compare UTF-8, UTF-16, UTF-32. Explain why UTF-8 dominates the web. Discuss why agreeing on a single standard was important for international communication.

Resources

Worksheet 1 — Character Encoding Recall Worksheet 2 — Character Encoding Apply Worksheet 3 — Exam-style Questions Exam Paper — Lesson 3 Binary Series Overview