Data Warehouse
Process Architecture
Need
to define data warehouse process architecture for an information
management project and want an overview of key architectural components?
What is a
data warehouse?
Think
of a data warehouse as a central storage facility which collects
information from many sources, manages it for efficient storage and
retrieval, and delivers it to many audiences, usually to meet decision
support and business intelligence requirements.
Data warehousing includes the following components:
- Extracting data from operational systems;
- Moving it into data warehouse structures;
- Reorganizing and structuring the data for
analysis purposes; and
- Moving it into reporting structures called data
marts.
What is
process architecture?
Data warehouse architecture involves the following components:
- Load management;
- Warehouse management; and
- Reporting management.
What is load
management?
Load management includes all of the software and utilities required to:
- Extract source system data and move it to the
warehouse environment;
- Complete
basic transformation to ensure that non-essential data is eliminated
and other data is converted to appropriate data types;
- Fast load data into a staging area where it can
be subsequently manipulated;
- Extract data from staging area and load
clearing house tables;
- Extract data from clearing house tables and
load the data warehouse; and
- Extract data from the data warehouse and load
data marts.
Load management involves the following types of software and utilities:
- Some form of scheduling software, such as
Control M, Autosys, Unix shell scripts or job control language;
- Extract
software, which might include custom extract routines developed in the
source application language, file copy utilities, stored procedures or
data integration software such as Informatica or Ab Initio;
- Fast
load utilities are usually provided by relational database management
system vendors and should be considered as an effective means of
loading data before simple transformations are applied; and
- Other
software such as stored procedures or data integration software is also
used to complete fast load and perform basic transformations that do
not require complex logic routines.
What is
warehouse management?
Warehouse management involves all of the software and system management
utilities required to:
- Clean and transform data;
- Create temporary holding tables to accommodate
merging data for analysis or cleansing purposes;
- Create and/or maintain indexes, views, and
table partitions;
- Aggregate data as necessary;
- De-normalize data if needed for performance
purposes;
- Archive data in each of the data warehouse
environments; and
- Complete incremental or full back-ups as needed.
Warehouse
management software and utilities is generally a combination of system
and database management utilities plus custom software such as stored
procedures or data integration software.
What is
reporting management?
Reporting
management involves the software required to ensure that business
intelligence reporting tools direct queries to the data that will
provide the quickest query response.
Reporting business
intelligence software such as Business Objects or Microstrategy
generally handles reporting management with support from custom
routines developed using relational database management software.
Summary...
A
data warehouse involves a lot of moving parts and requires a
significant investment in process architecture to ensure successful
operation and maintenance.
To ensure that service level
agreements are met, process architecture must ensure that all system
processes complete during the load window.
|