|
Database Normalization
Want a
primer on database normalization?
What is a normal database
design?
Think of database normalization as a design, which stores data only in
one place in the database e.g., person name should only be stored in
one table in the database—There should not be multiple versions of
person name in multiple database tables.
Why is this important?
Normalization was derived in conjunction with the SQL database
language. The premise is that, with a normalized database, you can
extract any sub-set of data using basic SQL without the need to create
complex procedural language. This is important because people designed
SQL for use without formal training in programming languages.
Data modelers need to adhere to normalization rules which define data
base design as:
- First normal form;
- Second normal form; and
- Third normal form.
The goal is to achieve a third normal form design with no violations of
normalization rules.
What is first normal form?
First normal form is defined as not having any multi-valued attributes
e.g.
Order
#
|
Order
Date
|
Product
1
|
QTY
1
|
Purchase
Price 1
|
Product
2
|
QTY
2
|
Purchase
Price 2
|
1
|
Jan 1
|
Knife
|
4
|
$10
|
Knife Clip
|
4
|
$2
|
2
|
Jan 2
|
Axe Head
|
2
|
$20 |
Axe Handle
|
1
|
$5
|
3
|
Jan 3
|
Shirt
|
4
|
$40
|
Tie
|
3
|
$20
|
In this example, we have a purchase order table that stores each
purchase order as one row. This example violates first normal form
because:
To find the purchase order price, we need to multiply QTY 1 and QTY 2
by Purchase Price 1 and Purchase Price 2.
What is second normal
form?
Second normal form occurs when you have a multi-attribute primary key.
This means that attributes are stored which are dependent upon a
portion of the primary key.
If attribute B is dependent on attribute A then, if we know the value
of attribute A, we know enough to find the value of attribute B e.g.
Company id is the primary key for company, we then have enough to know
other attributes about company such as business start date or type of
business. These attributes are dependent upon Company id.
If we are dependent on only a portion of the primary key then we are
violating second normal form.
What is third normal form?
A violation of third normal form occurs when there is a transitive
dependency e.g. an attribute that is dependent upon an attribute that
is neither a primary key nor part of a candidate key. These are always
serious violations and must be corrected e.g.
Employee
Id
|
Employee
Name
|
Region
#
|
Region
Name
|
1
|
John D
|
1
|
Eastern
|
2
|
Jane D
|
1
|
Eastern
|
3
|
John Q
|
2
|
Central
|
4
|
Jane Q
|
2
|
Central
|
5
|
Bill A
|
1
|
Eastern
|
In the above example, a correctly designed model would have two tables
as follows:
Employee
Table
Employee
Id
|
Employee
Name
|
Region
#
|
1
|
John D
|
1
|
2
|
Jane D
|
1
|
3
|
John Q
|
2
|
4
|
Jane Q
|
2
|
5
|
Bill A
|
1
|
Region Table
Region
#
|
Region
Name
|
1
|
Eastern
|
2
|
Central
|
Why is database
normalization important?
If the database is not normalized, it may be difficult to extract data
without writing complex procedural language. A normalized database
reduces data storage requirements and simplifies data extraction.
Summary...
Database normalization is a database design, which stores data only in
one place in the database.
A normalized database reduces data storage requirements and simplifies
data extraction.
|