Lesson 3 of 6 · Data & Binary

Character Encoding

How every letter, digit and emoji you type is stored as a binary number. ASCII, Unicode, and the difference between them.

Explain how characters are represented as binary numbers
Describe ASCII including its 7-bit structure and limitations
State one benefit and one drawback of Unicode compared to ASCII
Encode text using an ASCII table

How text becomes numbers

A computer stores everything as binary. Numbers translate directly. But how does it store the letter A? Or the symbol #? Or the word "Hello"?

The answer is a character encoding: an agreed table that assigns a unique number to every character. Every device that uses the same encoding will interpret the same binary pattern as the same character. Without agreement, one computer's "A" would be another computer's "5".

The personal connection: Type your first name into the encoder below. Every character you type has been stored as a unique binary number since the moment you first typed it anywhere digitally. This is not a special mode -- it is how every text file, every message, every webpage works.

ASCII

ASCII (American Standard Code for Information Interchange) was standardised in 1963. It uses 7 bits per character, giving 27 = 128 possible values (0 to 127).

Key values to remember: capital A = 65, capital B = 66, ..., capital Z = 90. The pattern continues: lowercase a = 97, b = 98. Digit 0 = 48.

The gap trick: A = 65, a = 97. The difference is exactly 32. To convert an uppercase letter to lowercase in ASCII, add 32. To go back, subtract 32. The difference between '0' (48) and 'A' (65) is 17 -- which is why you cannot just subtract '0' to get the numeric value of digit '9'.
A
65
01000001
B
66
01000010
M
77
01001101
Z
90
01011010
a
97
01100001
z
122
01111010
0
48
00110000
9
57
00111001
!
33
00100001
space
32
00100000

128 characters is enough for English: all letters (upper and lower), digits, punctuation, and control characters like newline and tab. But it was designed for American English in the 1960s -- and the world is considerably larger than that.

ASCII limitation: 128 characters cannot cover accented letters (e, u, o), non-Latin scripts (Chinese, Arabic, Hindi), currency symbols outside $, or any emoji. Systems using only ASCII cannot correctly process text in most of the world's languages.

Unicode

Unicode was developed to solve ASCII's limitation. It assigns a unique code point to every character in every writing system on earth -- currently over 149,000 characters including emoji.

Unicode is not one fixed-size encoding. It has several implementations:

EncodingBits per characterRangeCompatibility
UTF-88 to 32 (variable)All UnicodeASCII compatible, 1 byte for basic English
UTF-1616 or 32 (variable)All UnicodeUsed internally in many systems
UTF-3232 (fixed)All Unicode4x larger files for ASCII text

UTF-8 is the dominant encoding on the web because it is backwards compatible with ASCII (the first 128 code points are identical) and uses only 1 byte for standard English text.

PropertyASCIIUnicode (UTF-8)
Bits per character7 (stored as 8)8 to 32
Characters supported128Over 149,000
Language supportEnglish onlyAll major languages + emoji
File size (English text)SmallerSame (1 byte/char for ASCII range)
File size (non-English)Cannot store2-4 bytes per character
Exam phrasing: Benefit of Unicode: "can represent characters from a wider range of languages / more characters". Drawback: "files use more storage / uses more bits per character than ASCII". Both sides must be specific.

Character Encoder

Character Encoder

Interactive

Type any text. See each character's ASCII code and 8-bit binary value instantly. Try your first name.

Exam Focus
  • ASCII questions often give you a partial table (e.g. K=01001011, L=01001100, M=01001101, N=01001110) and ask you to encode a word like "POP". You must work out P's code from the pattern (P=01010000), not from memory.
  • Benefit and drawback questions require one specific point each. "Unicode is better" scores 0. "Unicode supports more characters / languages" scores the mark.
  • The drawback of Unicode is file size or processing overhead -- not "it is more complicated". Be precise.
  • ASCII stores characters in 7 bits but typically uses an 8th bit as padding, so 1 byte per character. Know both facts.

Check your understanding

1. Using the pattern K=01001011, L=01001100, M=01001101, N=01001110, what is the binary code for the letter P?
01001111
01010000
01001110
01010010
K=75, L=76, M=77, N=78. The pattern continues: O=79 (01001111), P=80 (01010000). Each letter increments the binary code by 1.
2. How many different characters can ASCII represent?
256 (28)
128 (27)
64 (26)
512 (29)
ASCII uses 7 bits per character. 27 = 128 possible values (0 to 127).
3. State one benefit of using Unicode instead of ASCII.
Unicode is simpler to implement than ASCII
Unicode uses fewer bits per character
Unicode can represent characters from a much wider range of languages
Unicode files are always smaller
The key benefit is character range: over 149,000 characters covering all major world scripts, plus emoji. ASCII only supports 128 characters (primarily English).
4. State one drawback of using Unicode instead of ASCII.
Unicode cannot represent English characters
Unicode uses more bits per character, resulting in larger file sizes for some text
Unicode is not compatible with any existing systems
Unicode cannot represent numbers
The drawback is storage: Unicode (especially UTF-16/UTF-32) uses more bits per character than ASCII's 7/8 bits, so files are larger. For English-only text, UTF-8 is the same size as ASCII.
5. A text file contains 500 characters, all within the standard ASCII range. How many bytes of storage does it require in ASCII?
350 bytes (500 × 7 bits = 3500 bits = 437.5 bytes)
500 bytes (500 × 8 bits = 4000 bits = 500 bytes)
1000 bytes (500 × 16 bits)
2000 bytes (500 × 32 bits)
ASCII characters are stored as 8 bits (1 byte) each, even though only 7 bits are needed. 500 characters × 1 byte = 500 bytes.

Think Deeper

Unicode includes over 3,500 emoji (such as face with tears of joy, fire, heart). Why does encoding emoji require more than 1 byte in UTF-8, and what does this mean for storage of emoji-heavy text?
Emoji code points in Unicode are typically above U+007F (127), often in the range U+1F600 and above. UTF-8 encodes these using 3 or 4 bytes rather than 1. A message of 100 emoji uses 300-400 bytes rather than 100 bytes. For social media platforms storing billions of messages with heavy emoji use, this storage cost is significant -- one reason platforms often store text in optimised binary formats rather than raw UTF-8.
Before Unicode, different countries used different 8-bit extensions of ASCII (ISO-8859-1 for Western European, Windows-1251 for Cyrillic, etc.). What problems did this cause for international communication? How did Unicode solve them?
Each regional encoding used different values for the upper 128 code points (128-255). A document created with one encoding would display garbled characters on a system using a different encoding -- this was called "mojibake". Email and webpages from different countries were often unreadable. Unicode provided a single globally agreed set of code points so the same number always means the same character, regardless of where the text was created. UTF-8 replaced dozens of incompatible standards and is now used by over 98% of websites.
Next Lesson
Representing Images
Continue
Printable Worksheets

Practice what you've learned

Three printable worksheets covering character encoding at three levels: Recall, Apply, and Exam-style.

Recall
Worksheet 1
Key term matching + ASCII table completion + True/False • 18 marks
Apply
Worksheet 2
ASCII encoding + file size calculations + comparison table • 17 marks
Exam-style
Worksheet 3
Extended encoding, file size and Unicode evaluation questions • 20 marks
Exam Practice
Lesson 3: Character Encoding
5 MCQ with instant feedback + 3 written questions with mark schemes.
Start exam practice Download PDF exam
Teacher Panel
Lesson 3 -- Character Encoding
Lesson Objectives
Explain that character encoding assigns a unique binary number to each character
State that ASCII uses 7 bits (128 characters) with key values: A=65, a=97, 0=48
State the limitation of ASCII (English only, cannot represent global languages)
State one benefit (more characters/languages) and one drawback (larger files) of Unicode
Encode a short word using a given ASCII table in an exam context
Timing Guide
0-5 min: "Encode your name" activity using the tool -- everyone starts engaged
5-12 min: ASCII structure -- 7 bits, 128 chars, key values (A=65, a=97, 0=48)
12-18 min: ASCII limitations -- why 128 is not enough for global communication
18-25 min: Unicode -- UTF-8, variable width, benefit vs drawback
25-30 min: Exam question practice -- given ASCII table, encode a word
Common Misconceptions
"ASCII uses 8 bits" -- ASCII is defined as 7-bit (128 characters). It is stored in 8 bits with a leading zero, but is a 7-bit system.
"Unicode replaces binary" -- Unicode IS a binary encoding. Each code point is still stored as bits.
"Unicode files are always bigger" -- UTF-8 Unicode files for English text are exactly the same size as ASCII files (1 byte per character).
Students often confuse character code (the number) with the binary representation. The number 65 and the binary 01000001 are the same value -- not two separate things.
Marking Guidance

ASCII encoding from a table (2 marks): The question will give a partial table and ask you to encode a word. Students must use the arithmetic pattern (each letter increments by 1) not just memory. Show the code for each letter separately.

Benefit/drawback questions (1+1 marks): "Unicode is better" = 0. "Unicode can represent characters from more languages" = 1 mark. "Unicode uses more storage per character" = 1 mark. Specificity is everything.

The "Encode Your Name" Activity
5 minutes at the start. Students type their first name into the encoder. Discussion: is your name the same in binary in every country? (Yes, for ASCII names -- this is the whole point of agreed standards.) What about students with non-ASCII names? (They cannot be stored in ASCII -- this is the problem Unicode solves.)
This personal hook makes the abstract concept of encoding immediately concrete and memorable.
Exit Tickets
Given K=01001011, encode the word "MOM" using the ASCII pattern. [2 marks]
State one benefit and one drawback of Unicode compared to ASCII. [2 marks]
How many bits does ASCII use per character? How many characters can it represent? [2 marks]
Differentiation
Grade 4 Encode from a given ASCII table. State ASCII uses 7 bits. One benefit and one drawback of Unicode.
Grade 7 All of Grade 4 plus: calculate file size in bytes from character count. Explain why ASCII fails for global languages.
Grade 9 Compare UTF-8, UTF-16, UTF-32. Explain why UTF-8 dominates the web. Discuss why agreeing on a single standard was important for international communication.