content="David Bowman, Saint Lambert, Quebec CANADA">
logo for information-management-architect.com
leftimage for information-management-architect.com

Database Normalization

Want a primer on database normalization?

What is a normal database design?

Think of database normalization as a design, which stores data only inDatabase Normalization one place in the database e.g., person name should only be stored in one table in the database—There should not be multiple versions of person name in multiple database tables.

Why is this important?

Normalization was derived in conjunction with the SQL database language. The premise is that, with a normalized database, you can extract any sub-set of data using basic SQL without the need to create complex procedural language. This is important because people designed SQL for use without formal training in programming languages.

Data modelers need to adhere to normalization rules which define data base design as:
  • First normal form;
  • Second normal form; and
  • Third normal form.
The goal is to achieve a third normal form design with no violations of normalization rules.

What is first normal form?

First normal form is defined as not having any multi-valued attributes e.g.

Order #
Order Date
Product 1
QTY 1
Purchase Price 1
Product 2
QTY 2
Purchase Price 2
1
Jan 1
Knife
4
$10
Knife Clip
4
$2
2
Jan 2
Axe Head
2
$20 Axe Handle
1
$5
3
Jan 3
Shirt
4
$40
Tie
3
$20


In this example, we have a purchase order table that stores each purchase order as one row. This example violates first normal form because:

To find the purchase order price, we need to multiply QTY 1 and QTY 2 by Purchase Price 1 and Purchase Price 2.

What is second normal form?

Second normal form occurs when you have a multi-attribute primary key. This means that attributes are stored which are dependent upon a portion of the primary key.

If attribute B is dependent on attribute A then, if we know the value of attribute A, we know enough to find the value of attribute B e.g. Company id is the primary key for company, we then have enough to know other attributes about company such as business start date or type of business. These attributes are dependent upon Company id.

If we are dependent on only a portion of the primary key then we are violating second normal form.

What is third normal form?

A violation of third normal form occurs when there is a transitive dependency e.g. an attribute that is dependent upon an attribute that is neither a primary key nor part of a candidate key. These are always serious violations and must be corrected e.g.

Employee Id
Employee Name
Region #
Region Name
1
John D
1
Eastern
2
Jane D
1
Eastern
3
John Q
2
Central
4
Jane Q
2
Central
5
Bill A
1
Eastern


In the above example, a correctly designed model would have two tables as follows:

Employee Table

Employee Id
Employee Name
Region #
1
John D
1
2
Jane D
1
3
John Q
2
4
Jane Q
2
5
Bill A
1

Region Table

Region #
Region Name
1
Eastern
2
Central


Why is database normalization important?

If the database is not normalized, it may be difficult to extract data without writing complex procedural language. A normalized database reduces data storage requirements and simplifies data extraction.

Summary...

Database normalization is a database design, which stores data only in one place in the database.

A normalized database reduces data storage requirements and simplifies data extraction.



footer for Information management page