Database Concepts
Understand what databases are, why flat-file databases cause problems, and how relational databases solve them using tables, records, fields, data types and primary keys.
Every time you log into an app, search for a product, or book a ticket, a database is working behind the scenes. Spotify stores millions of songs, artists and users. Amazon stores billions of products, orders and customers. How do they keep all that data organised without it becoming a contradictory mess?
A flat-file database stores all data in a single table. For simple data about one type of thing, this works fine - a table of book titles is a flat file. Problems start when your data involves multiple related entities.
The example below stores student enrolments. Notice what happens when three students are on the same course:
| StudentID | StudentName | Age | CourseName | CourseCode | TeacherName | TeacherEmail |
|---|---|---|---|---|---|---|
| 1 | Alice Smith | 15 | GCSE Maths | MA11 | Mr Jones | jones@school.uk |
| 2 | Bob Brown | 16 | GCSE Maths | MA11 | Mr Jones | jones@school.uk |
| 3 | Clara Davis | 15 | GCSE Maths | MA11 | Mr Jones | jones@school.uk |
| 4 | Dan Evans | 15 | GCSE CS | CS11 | Ms Patel | patel@school.uk |
Red cells = redundant data (identical values repeated across multiple rows)
Update anomaly: If Mr Jones changes his email, you must update every row where he appears. Miss one row and the data is contradictory.
Insertion anomaly: You cannot add a new course to the database without also adding a student - there is nowhere else to put course data.
Deletion anomaly: If the last student on a course leaves, deleting their row also deletes all information about the course and teacher.
Whether flat-file or relational, every database is built from tables. A table has columns (fields) and rows (records).
| StudentID | FirstName | LastName | DateOfBirth | IsEnrolled |
|---|---|---|---|---|
| 1001 | Alice | Smith | 2009-04-12 | TRUE |
| 1002 | Bob | Brown | 2008-11-03 | TRUE |
| 1003 | Clara | Davis | 2009-07-22 | FALSE |
Each column is a field - one attribute of every student. Each row is a record - all the data about one specific student. The table above has 5 fields and 3 records.
Every field must have a data type - this tells the database what kind of data to expect and how to store it.
You may be asked to choose an appropriate data type for a given field. Always consider whether the data is a number (will you do maths on it?), text, true/false, or a date. Storing a phone number as an integer is wrong - you cannot do arithmetic on phone numbers, and leading zeros would be lost.
A primary key is a field chosen to uniquely identify each record in the table. It is the foundation of database design.
In the Students table above, StudentID is the primary key. A student's name could match another student's name, their date of birth could match, but their ID is generated to be unique. Real-world examples: National Insurance number, ISBN for books, product barcodes.
You may be asked why a student's name is not a suitable primary key. The answer: names are not guaranteed to be unique (two students could be called Alice Smith), and a name field could be left blank. An automatically assigned ID has neither problem.
Test yourself
1. A database stores employee records. If one employee's department name appears in 50 rows and the department moves building, what is the minimum number of rows that must be updated in a flat-file database?
2. Which of the following is the most suitable primary key for a table of books?
3. What is the correct name for a single row in a database table?
A student suggests using a student's email address as a primary key instead of assigning a numeric ID. Give two reasons why this could cause problems.
Practice what you have learned
Three levels of worksheet for this lesson. Download, print and complete offline.