Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. It involves decomposing a table into less redundant (and smaller) tables without losing information and ensuring data dependencies make sense. The goal is to reduce data anomalies and improve data integrity.
Normalization typically involves several stages, known as normal forms. The most common are:
Normalization
A table is in 1NF if:
Atomic Values
Before 1NF - Student_Course Table
StudentID | Name | Courses | Instructor | Department |
---|---|---|---|---|
1 | Alice Johnson | Math, Science | Prof A, Prof B | Math, Science |
2 | Bob Lee | English, History | Prof C, Prof D | English, Humanities |
After 1NF - - Student_Course Table
StudentID | Name | Course | Instructor | Department |
---|---|---|---|---|
1 | Alice Johnson | Math | Prof A | Math |
1 | Alice Johnson | Science | Prof B | Science |
2 | Bob Lee | English | Prof C | English |
2 | Bob Lee | History | Prof D | Humanities |
Do first and last names need to be separated for the table to be in 1NF?
Some people would argue that first and last names are separate pieces of information and therefore should be separated into different fields in order for the table to be in 1NF. However in practice it largely depends on what you intend to do with the data.
If you are going to want to be editing the first or last name separately, or if you want to search/sort/filter by first or last name then it is certainly good practice to store each piece of data separately, but it's not strictly required for 1NF
What is the first normal form (1NF)?
A table is in 2NF if:
A table is not in 2NF if you can guess the value of a non-key column using only part of a composite primary key.
If there is no composite key (the primary key is a single column), the table is automatically in 2NF as long as it’s in 1NF. This is because partial dependencies only occur with composite keys.
Before 2NF - - Student_Course Table
StudentID | Name | Course | Instructor | Department |
---|---|---|---|---|
1 | Alice Johnson | Math | Prof A | Math |
1 | Alice Johnson | Science | Prof B | Science |
2 | Bob Lee | English | Prof C | English |
2 | Bob Lee | History | Prof D | Humanities |
A relation that is in 2nd normal form (2NF) is automatically in which normal form?
StudentID | Name |
---|---|
1 | Alice Johnson |
2 | Bob Lee |
Course | Instructor | Department |
---|---|---|
Math | Prof A | Math |
Science | Prof B | Science |
English | Prof C | English |
History | Prof D | Humanities |
StudentID | Course |
---|---|
1 | Math |
1 | Science |
2 | English |
2 | History |
For 3NF, the table must already be in 2NF, and there must be no transitive dependencies. Each non-key attribute must depend only on the primary key.
A table is in 3NF if you can’t figure out the value of one non-key column using another non-key column; everything must come directly from the primary key.
Course | Instructor | Department |
---|---|---|
Math | Prof A | Math |
Science | Prof B | Science |
English | Prof C | English |
History | Prof D | Humanities |
The Courses Table is not in 3NF because it has a transitive dependency.
Primary Key: The primary key here is Course
.
Problem: Instructor
depends on Course
, but Department
depends on Instructor
, which in turn depends on Course
.
To make the table 3NF, you should eliminate the transitive dependency by removing the link between Instructor
and Department
. You can do this by creating a new table for instructors.
Courses Table
Course | Instructor |
---|---|
Math | Prof A |
Science | Prof B |
English | Prof C |
History | Prof D |
Instructors Table
Instructor | Department |
---|---|
Prof A | Math |
Prof B | Science |
Prof C | English |
Prof D | Humanities |
Now, Department
depends directly on Instructor
, and there is no transitive dependency. The tables are in 3NF.
What is the purpose of the third normal form (3NF)?