logo for information-management-architect.com
Home
Strategy
Framework
Business Case Analysis
Project Planning
Requirements Analysis
Architecture & Design
Build Phase
Quality Assurance
Transition to Production
Management Information
Business Intelligence
Data Warehouse
Tools
Jobs
Contact David Bowman
leftimage for information-management-architect.com

Star Schema

Need to create a star schema for an information management project and want an overview of key design considerations?

What is a mart?

Data marts are a critical part of data warehouse design. They are created to store data required for analysis by specific departments e.g. manufacturing department, sales department or inventory control.

Data is extracted from the warehouse and transformed into facts and dimensions within a data mart, which is then queried using client tools or business intelligence reporting tools.

Sample Data Mart DesignStar Schema
What are facts and dimensions?

A fact is something that has occurred e.g. an invoice was sent, on a specific date, to customer A, at address B, for a specific quantity of product X, that was sold by sales representative Y.

A dimension is a way of analyzing the fact e.g.
Date or time period dimension answers questions such as what was the quantity of product X sold by:
  • Date;
  • Week;
  • Month;
  • Quarter;
  • Year; or
  • Fiscal year.
Product dimension answers additional questions such as what was the quantity of product X sold by?
  • Product type;
  • Product category;
  • Product market interest group; or
  • Any other product related attribute
Customer dimension answers additional questions such as:
  • Which customers purchased product X; or conversely
  • What products were purchased by customer A;
Address dimension answers additional questions such as what product was sold by:
  • City;
  • Sales territory;
  • State/province;
  • Country; or conversely
  • Which customers live in each of the address areas?
Why is it called a star schema?

A star schema usually has the fact in the middle with the dimensions surrounding it and it looks like a star, hence the name, star schema.

What are star schema design considerations?

Look for business transactions—These will usually translate into facts e.g. sales transaction, invoice, order, account transaction, customer claim, sales call event, customer event such as new service installation, move, or service disconnect.

Decide what dimensions are applicable—looking at the logical data model can help derive dimensions.

Ensure that a fact is not really a dimension with embedded facts e.g. a product fact might include date ordered, date re-ordered, date drop shipped, etc. Each date is actually an event, which occurred, and the event is really the fact of interest and the product is one of the dimensions for the event.

Ensure that a dimension is not really a fact e.g. if the requirement is to analyze customer profiles, then the customer is a fact. If the requirement is to analyze sales, then the customer is a dimension.

What are fact table design considerations?

Consider history analysis requirements e.g. it may be appropriate to store detail history in the data warehouse but it may not be necessary in the fact table. Consider if the reporting requirements can be met with sample data or even summary data.

Eliminate operational and non-essential data e.g. operational status information (This data should have already been removed before storing it in the warehouse).

Consider column size. Fact tables may contain very large volumes of data and small decreases in column size can have a dramatic impact on database size.

Consider how to store dates on the fact table e.g. using a surrogate key linked to a dimension or storing a physical date linked to dimension. This is one case where it more practical to use a physical date instead of a surrogate key.

Consider data partitioning to improve load performance, query performance and archiving operations.

What are dimension table design considerations?

Star schema design improves query performance by de-normalizing reference data into a single dimension table.

Consider if snowflake design is required to optimize query performance.

Consider how slowly changing dimensions will be handled e.g. if there is a requirement to compare facts within a grouping that exits today with a grouping that existed a year ago, it is necessary to create a new row in the dimension table, with a date, whenever the grouping changes.

Summary...

Star schema's are a critical part of data warehouse design. They are created to store data required for analysis by specific departments e.g. manufacturing department, sales department or inventory control.

A data architect has to consider fact and dimension design for optimal storage and query performance.


footer for Information management page