Slowly Changing Dimension
Need
to design a slowly changing dimension for an information management
project and want an overview of key design considerations?
What is a
mart?
Data marts are a critical part of data warehouse design. They are
created to store data required for analysis by specific departments
e.g. manufacturing department, sales department or inventory control.
Data is extracted from the warehouse and transformed into facts and
dimensions within a data mart, which is then queried using client tools
or business intelligence reporting tools.
Sample Data Mart Design
What is a fact?
A data warehouse or data mart, classifies data as a factual information
event and the reference data that is used to analyze the fact e.g.
suppose an order was made, on a
specific date, to customer A, at address B, for a specific quantity of
product X, that was sold by sales representative for the amount is $100.
The order amount of $100 is considered a fact and the quantity of
product sold is considered a second fact.
What is a dimension?
Dimension data is the information that is used to analyze the factual
information or elemental transaction e.g. in the example above, product
hierarchy, customer, sales territory and date would be considered
dimensional data.
What is a date dimension?
A date or time period dimension is used to analyze facts by data or
time e.g. it can support queries that determine sales by:
- Date e.g. what were the total sales on August 24?
- Week e.g. what were the total sales for week ending
Aug 29?
- Month e.g. what were the total sales for August?
- Quarter e.g. what were the total sales for the third
quarter?
- Year e.g. what were the total sales for 2012?
- Fiscal year e.g. what were the total sales for fiscal
year 2009?
The
date dimension may be even more granular and, if the information is
available, might even support queries that can determine sales by time
of day.
What is a product dimension?
A product hierarchy dimension is used to analyze facts by the various
components of the product hierarchy e.g. it can support queries that
determine sales by:
- Product e.g Wilson Lead Pencil;
- Color e.g. Yellow;
- Package e.g. Box of 12;
- Product Type e.g. Pencils;
- Product Category e.g. Office Supplies; or
- Any other product related attribute that forms part
of the product hierarchy.
What is a customer dimension?
A customer dimension may include any attribute that is deemed important
for analysis purposes e.g.
- Customer name e.g. Wilson's Office Supplies;
- Customer type e.g. Business, Personal;
- Age;
- Gender; or
- Any other customer related attribute.
What is an address dimension?
Address dimension answers additional questions such as what product was
sold by:
- City;
- State/province;
- Country; or conversely
- It can be used to analyze which customers live in
each of the address
areas?
What is a sales territory
dimension?
A sales territory dimension may include any attribute that is deemed
important for sales analysis purposes e.g.
- Sales Representative e.g John Q;
- Sales Territory e.g. Long Island;
- Sales Metropolitan Area e.g. New York;
- Sales Area e.g. Eastern USA;
- Sales Region, e.g Continental USA.
What is a slowly changing
dimension?
A slowly changing dimension is a dimension that varies over time e.g.
in the example sales territory dimension, sales representative John Q
might leave, or be assigned to a new sales territory and be replaced by
sales representative Jane Q.
If there is no business requirement to track sales by individual sales
representatives, then there is no problem.
However, if there is a requirement to track sales by individual sales
representatives, then we have a slowly changing dimension.
If this is a requirement the it is necessary to store date ranges on
the dimension e.g. from date and thru date.
This is accomplished by inserting a new record into the dimension table
i.e. if any of the values in the dimension change then we would create
a new dimension record instead of updating values in the old row.
In the following example, we would add June 30, 2009 to the thru date
for John Q in Sales Territory Long Island, create a new row for Jane Q,
with a from date of July 1, and possibly create a new row for
John Q if he were assigned to a new sales territory.
Column
|
Current
|
Change
|
New
|
Sales Rep
|
John Q
|
Jane Q
|
John Q
|
Sales Territory
|
Long Island
|
Long Island
|
Albany
|
MSA
|
New York
|
New York
|
Albany
|
Area
|
Eastern
|
Eastern
|
Eastern
|
Region
|
USA
|
USA
|
USA
|
From Date
|
Jan 1, 2009
|
July 1, 2009
|
July 1, 2009
|
Thru Date
|
June 30, 2009
|
|
|
Summary...
A
slowly changing dimension is a dimension that varies over time. If
there is a business requirement to analyze facts by changes that occur
to the dimension then a new row needs to be inserted in the dimension
table reflecting the change.
|