|
Data Aggregation
Need to understand
information management data aggregation and want an overview of key
architectural components?
What is data aggregation?
Summary data is a key part of data warehouse design. It provides a cost
effective means of improving query performance without the cost of
investing in additional hardware and storage devices.
Creating summary table structure allows the data architect to design a
solution that will rapidly satisfy common queries that access a sub-set
of data.
What is a typical summary
data requirement?
Suppose that a marketing manager wants to promote the sale of
porterhouse steaks within a specific geographic area. This might
involve looking at customer profiles to determine which type of
customer might buy porterhouse steaks and by offering some form of
incentive to switch from round steaks, perhaps fish or pork products.
A typical query might look at:
- Customer income;
- Customer spending habits over a reasonable period of
time (Perhaps two or three years); and
- Customer buying habits for meat; within
- A specific geographic area e.g. all postal codes
within a specific state or province.
If we look at the steps involved in executing the above query, we can
see that is might be simpler to have a series of pre-aggregated
summary tables. This might summarize sales of customers who purchase
meat by:
- Customer;
- Location;
- Income; and
- Spending habits.
How are summary tables
created?
Aggregations can be generated by:
- Developing ETL software, to summarize data from data
warehouse source tables and reload the summary tables each time a new
data warehouse load is completed;
- Creating views to summarize data from data warehouse
source tables each time the view is called; and
- Creating materialized views, to summarize from data
warehouse source tables each time a new data warehouse load is
completed.
What is involved with
designing aggregation tables?
A data architect has to:
- Determine which dimensions need aggregation;
- Determine the aggregation of multiple values;
- Aggregate multiple facts into summary tables;
- Determine the level of aggregation;
- Determine the extent of embedding dimension data into
summary tables;
- Add time into the summary table; and
- Determine the best indexing strategy for the
aggregation.
Summary...
Data aggregation is a key part of data warehouse design. It provides a
cost
effective means of improving query performance without the cost of
investing in additional hardware and storage devices.
Creating summary table structure allows the data architect to design a
solution that will rapidly satisfy common queries that access a sub-set
of data.
|