logo for information-management-architect.com
leftimage for information-management-architect.com

Relational Database Design
Physical Data Model

Need to understand information management relational database design and want an overview of key architectural components?

What is data warehouse architecture?

A data warehouse is the foundation for information management andRelational Database Design involves:
  • Extracting data from operational systems;
  • Moving it into data warehouse structures;
  • Reorganizing and structuring the data for analysis purposes; and
  • Moving it into reporting structures called data marts. 
What are data warehouse structures?

Data warehouse database design is based on two key design concepts:
  • Relational database design; and
  • Dimensional, or star schema design.
What is the foundation for data warehouse relational database design?

Successful data warehouse design is a process that commences with a series of data models.
  • Conceptual data model  is created at the information management strategy stage.  It contains key entities and relationships and presents a high level look at all of the entities within an organization.
  • Enterprise data model is an entity relationship diagram which builds on the conceptual data model and adds additional details;
  • Logical data modelis a fully attributed entity relationship diagram (ERD), which shows each entity, its relationship to other entitles and specifies the applicable business rules;
  • Dimensional model, represents facts and dimensions, which are required to store dimension data in an efficient manner for reporting and analysis purposes; and
  • Physical data model, is the final representation of the relational database design structures that will be generated from the model. It contains the detailed specifications for the database design and, in a model driven environment, the modeling tool will generate the data definition language (DDL) that is used to create the database structure. The physical model can be used to generate:
    • Databases;
    • Tablespaces
    • Files;
    • Instances;
    • Tables, columns and datatypes;
    • Constraints;
    • Partitions;
    • Indexes; and
    • Views.
What are databases?

Databases are created to store a set of data in a relational manner.  Database software such as Oracle, Teradata, Sybase, DB2, and PostgreSQL are generally called relational database management systems (RDBMS). Data in a database is stored in tables. Relational tables are defined by their columns and have a name. Data is stored in rows within a table e.g

The following table (Customer) has three columns, and four rows.

Row #
First Name
Last Name
Account NBR
1
John
Doe
123
2
Jane
Doe
456
3
John
Q
789
4
Jane
Q
101112

What are tablespaces?

Oracle uses tablespaces to logically divide a database. Each database has a system tablespace and may have other tablespaces as defined by the database administrator e.g. a tablespace for roll-back segments.

What are files?

Each tablespace is comprised of one or many datafiles which physically store the data.

What are database instances?

A database instance, sometimes called a server, is a set of memory structures and processes that access a set of datafiles. Datafiles may be accessed by more than one instance.

What are tables, columns and datatypes?

Tables are the storage mechanism used to store data. They all have a name e.g. ACCOUNT, and this name must conform to the RDBMS naming convention.

Each table is comprised of a fixed number of columns which also must have a name e.g. CUSTOMER_FIRST_NAME.

Each column must have a datatype assigned e.g. CHAR, VARCHAR, VARCHAR2, BLOB, NUMBER, DATE.

What are relational database design constraints?

Constraints are an important part of relational database design. These are  documented in the physical data model. Constraints might include:
  • Primary key constraint e.g. customer_number might be defined as a primary key. This means that in must also be NOT NULL and must be unique--Attempts to insert a duplicate customer_number will be rejected by the RDBMS;
  • NOT NULL constraint means that a value must be inserted when a row is created e.g. if customer_account_number is defined as NOT NULL then attempts to insert a row without a customer_account_number will be rejected;
  • Default is another type of constraint the is used to insert a default value for a column whenever a row is inserted into the table and no value is entered for the column;
  • Check constraint is used to ensure that a value inserted in a column meets a predefined value e.g. transaction amount must be less than 100,000;
  • Unique constraint specifies that the value in a column may only occur once in the table e.g. in a country name table, a country name can only occur once; and
  • Foreign key constraint is defined when two tables are related and the value in one table is dependent upon finding a value in a related table e.g. address table may include a country code column which is a foreign key relationship, or constraint, to country code which is in country name table. Attempts to insert a country code in the address table, that does not exist in the country name table, will fail.
Database constraints help ensure referential integrity.

What are partitions?

Large tables can be split into smaller physical tables using a variety of techniques. This may have performance advantages particularly with data warehouse design. Some s software will allow parallel loading which will improve performance. It's also easier to truncate a complete partition from a database without having to truncate data in other partitions.

What are relational database design indexes?

An index is used to quickly find a row in a table. Indexes contain a list of entries with a key value and a row id for the table. Indexes are used to improve performance and to ensure uniqueness of a column e.g.
  • A bitmap index might be created when there are few distinct values in a column and are quite useful for static data;
  • A cluster index is created for tables that  are frequently accessed together. The data in the tables is stored together to minimize inserts and retrievals and the related columns of the table are called the cluster key. The cluster key is indexed using a cluster index and it's value is only stored once for the multiple tables in the cluster;
  • Hash clusters use a hashing function on the row's cluster key to determine the physical location where the row will be stored.
What are views?

A view appears to be a table with columns. It can be queried in the same way a table can be queried. When a view is queried, it then queries the base tables of the view and returns the values in the format and order specified in the view. Since there is no physical data directly associated with views, views cannot be indexed.

What are materialized views?

A relational database design materialized view is similar to a view except that data is actually stored in a table. The view is refreshed based on the frequency established when the materialized view is created. This is particularly important with data warehouse design where frequent data base access can be costly.

Since the materialized view is actually a table, it is possible to index it for performance purposes.

Summary...

Relational database design is documented in a physical data model  which can be used to generate all physical database objects.

A data architect creates the physical relational database design based on entity relationship diagrams and logical data models.

footer for Information management page