Star Schema
Need to create a star schema for
an information management project and want an overview of key design
considerations?
What is a
mart?
Data marts are a critical part of data warehouse design. They are
created to store data required for analysis by specific departments
e.g. manufacturing department, sales department or inventory control.
Data is extracted from the warehouse and transformed into facts and
dimensions within a data mart, which is then queried using client tools
or business intelligence reporting tools.
Sample Data Mart Design
What are
facts and dimensions?
A fact is something that has occurred e.g. an invoice was sent, on a
specific date, to customer A, at address B, for a specific quantity of
product X, that was sold by sales representative Y.
A dimension is a way of analyzing the fact e.g.
Date or time period dimension answers questions such as what was the
quantity of product X sold by:
- Date;
- Week;
- Month;
- Quarter;
- Year; or
- Fiscal year.
Product dimension answers additional questions such as what was the
quantity of product X sold by?
- Product type;
- Product category;
- Product market interest group; or
- Any other product related attribute
Customer dimension answers additional questions such as:
- Which customers purchased product X; or
conversely
- What products were purchased by customer A;
Address dimension answers additional questions such as what product was
sold by:
- City;
- Sales territory;
- State/province;
- Country; or conversely
- Which customers live in each of the address
areas?
Why is it
called a star schema?
A star schema usually has the fact in the middle with the dimensions
surrounding it and it looks like a star, hence the name, star schema.
What are
star schema design considerations?
Look for business transactions—These will usually translate into facts
e.g. sales transaction, invoice, order, account transaction, customer
claim, sales call event, customer event such as new service
installation, move, or service disconnect.
Decide what dimensions are applicable—looking at the logical data model
can help derive dimensions.
Ensure that a fact is not really a dimension with embedded facts e.g. a
product fact might include date ordered, date re-ordered, date drop
shipped, etc. Each date is actually an event, which occurred, and the
event is really the fact of interest and the product is one of the
dimensions for the event.
Ensure that a dimension is not really a fact e.g. if the requirement is
to analyze customer profiles, then the customer is a fact. If the
requirement is to analyze sales, then the customer is a dimension.
What are
fact table design considerations?
Consider history analysis requirements e.g. it may be appropriate to
store detail history in the data warehouse but it may not be necessary
in the fact table. Consider if the reporting requirements can be met
with sample data or even summary data.
Eliminate operational and non-essential data e.g. operational status
information (This data should have already been removed before storing
it in the warehouse).
Consider column size. Fact tables may contain very large volumes of
data and small decreases in column size can have a dramatic impact on
database size.
Consider how to store dates on the fact table e.g. using a surrogate
key linked to a dimension or storing a physical date linked to
dimension. This is one case where it more practical to use a physical
date instead of a surrogate key.
Consider data partitioning to improve load performance, query
performance and archiving operations.
What are
dimension table design considerations?
Star schema design improves query performance by de-normalizing
reference data into a single dimension table.
Consider if snowflake design is required to optimize query performance.
Consider how slowly changing dimensions will be handled e.g. if there
is a requirement to compare facts within a grouping that exits today
with a grouping that existed a year ago, it is necessary to create a
new row in the dimension table, with a date, whenever the grouping
changes.
Summary...
Star schema's are a critical part of data warehouse design. They are
created to store data required for analysis by specific departments
e.g. manufacturing department, sales department or inventory control.
A data architect has to consider fact and dimension design for optimal
storage and query performance.
|