Relational Database Design
Physical Data Model
Need to understand information management
relational database design and want an overview of key architectural
components?
What is data warehouse
architecture?
A data warehouse is the foundation for information management and
involves:
- Extracting data from operational systems;
- Moving it into data warehouse structures;
- Reorganizing and structuring the data for
analysis purposes; and
- Moving it into reporting structures called data
marts.
What are data warehouse
structures?
Data warehouse database design is based on two key design concepts:
- Relational database design; and
- Dimensional, or star schema design.
What is the foundation
for data warehouse relational database design?
Successful data warehouse design is a process that commences with a
series of data models.
- Conceptual
data model is
created at the
information management strategy
stage. It contains key entities and relationships and
presents a
high level look at all of the entities within an organization.
- Enterprise data model is an entity relationship
diagram which builds on the conceptual data model and adds additional
details;
- Logical
data model, is a
fully attributed entity
relationship diagram (ERD), which shows
each
entity, its relationship to other entitles and specifies the applicable
business rules;
- Dimensional
model, represents facts and dimensions,
which are required to store dimension data in an efficient manner for
reporting and analysis purposes; and
- Physical data model, is the final
representation of the relational database design structures that will
be
generated from the model. It contains the detailed specifications for
the database design and, in a model driven environment, the modeling
tool will generate the data definition language (DDL)
that is used to create the database structure. The physical model can
be used to generate:
- Databases;
- Tablespaces
- Files;
- Instances;
- Tables, columns and datatypes;
- Constraints;
- Partitions;
- Indexes; and
- Views.
What are databases?
Databases
are created to store a set of data
in a relational manner.
Database software such as Oracle, Teradata, Sybase, DB2, and PostgreSQL
are
generally called relational database management systems (RDBMS). Data
in a database is stored in tables. Relational tables are defined by
their columns and have a name. Data is stored in rows within a table e.g
The following table (Customer) has three columns, and four rows.
Row
#
|
First
Name
|
Last
Name
|
Account
NBR
|
1
|
John
|
Doe
|
123
|
2
|
Jane
|
Doe
|
456
|
3
|
John
|
Q
|
789
|
4
|
Jane
|
Q
|
101112
|
What are tablespaces?
Oracle uses tablespaces to logically divide a database. Each database
has a system tablespace and may have other tablespaces as defined by
the
database administrator e.g. a tablespace for roll-back segments.
What are files?
Each tablespace is comprised of one or many datafiles which physically
store the data.
What are database
instances?
A database instance, sometimes called a server, is a set of memory
structures and processes that access a set of datafiles. Datafiles may
be accessed by more than one instance.
What are tables, columns
and datatypes?
Tables are the storage mechanism used to store data. They all have a
name e.g. ACCOUNT, and this name must conform to the RDBMS naming
convention.
Each table is comprised of a fixed number of columns which also must
have a name e.g. CUSTOMER_FIRST_NAME.
Each column must have a datatype assigned e.g. CHAR, VARCHAR, VARCHAR2,
BLOB, NUMBER, DATE.
What are relational
database design constraints?
Constraints are an important part of relational database design. These
are documented in the physical data model. Constraints might
include:
- Primary key constraint e.g. customer_number might be
defined as a primary key. This
means that in must also be NOT NULL and must be unique--Attempts to
insert a
duplicate customer_number will be rejected by the RDBMS;
- NOT
NULL constraint means that a value must be inserted when a row is
created e.g. if customer_account_number is defined as NOT NULL then
attempts to insert a row without a customer_account_number will be
rejected;
- Default is another type of constraint the is used to
insert a default value for a column whenever a row is inserted into the
table and no value is entered for the column;
- Check constraint is used to ensure that a value
inserted in a column meets a predefined value e.g. transaction amount
must be less than 100,000;
- Unique constraint specifies that the value in a
column may only occur once in the table e.g. in a country name table, a
country name can only occur once; and
- Foreign key constraint is defined when two tables are
related and the value in one table is dependent upon finding a value in
a related table e.g. address table may include a country code column
which is a foreign key relationship, or constraint, to country code
which is in country name table. Attempts to insert a country code in
the address table, that does not exist in the country name table, will
fail.
Database constraints help ensure referential integrity.
What are partitions?
Large tables can be split into smaller physical tables using a variety
of techniques. This may have performance advantages particularly with
data warehouse design. Some s software will allow parallel loading
which will improve performance. It's also easier to truncate a complete
partition from a database without having to truncate data in other
partitions.
What are relational
database design indexes?
An index is used to quickly find a row in a table. Indexes contain a
list of entries with a key value and a row id for the table. Indexes
are used to improve performance and to ensure uniqueness of a column
e.g.
- A bitmap index might be created when there are few
distinct values in a column and are quite useful for static data;
- A cluster index is created for tables
that are frequently accessed together. The data in the tables is
stored together to minimize inserts and retrievals
and the related columns of the table are called the cluster key. The
cluster key is indexed using a cluster index and it's value is only
stored once for the multiple tables in the cluster;
- Hash clusters use a hashing function on the row's
cluster key to determine the physical location where the row will be
stored.
What are views?
A view appears to be a table with columns. It can be queried in the
same way a table can be queried. When a view is queried, it then
queries the base tables of the view and returns the values in the
format and order specified in the view. Since there is no physical data
directly associated with views, views cannot be indexed.
What are materialized
views?
A relational database design materialized view is similar to a view
except that data is actually stored in a table. The view is refreshed
based on the frequency established when the materialized view is
created. This is particularly important with data warehouse design
where frequent data base access can be costly.
Since the materialized view is actually a table, it is possible to
index it for performance purposes.
Summary...
Relational database design is documented in a
physical data model which
can be used to generate all physical database objects.
A data architect creates the physical relational
database design based on entity relationship diagrams and logical data
models.
|