logo for information-management-architect.com
leftimage for information-management-architect.com

Data Warehouse Architecture

Improve service with David Bowman’s information management guidelines for data warehouse architecture

This site is designed for Information Technology professionals who need  to improve service and want guidance and direction to help teams consistently produce error free results.

It provides information management guidelines for data warehouse architecture.

What is an enterprise data warehouse?

Think of a data warehouse as a central storage facility which collects information from many sources, manages it for efficient storage and retrieval, and delivers it to many audiences, usually to meet decision support and business intelligence requirements.

Remember the days before we had computers?

Information was stored in file folders or sometimes forgotten in desk drawers.

It was never available in the right hands at the right time to make the right decision until records management was introduced and central filing systems were established. 

These central-filing systems gathered information from everyone (many sources), managed it for efficient storage and retrieval, and delivered it to many audiences for analysis purposes.

These central-filing systems helped minimize duplication of data and went a long way towards eliminating problems finding files that were lost on some-one’s desk while they took an extended winter vacation.

They also helped reduce information storage requirements as they eliminated the need for many filing cabinets, in many departments to hold duplicate files.

These central filing systems were governed by records management experts who took ownership for establishing records management principles, file storage design, file management processes, security, and cataloging to ensure easy file retrieval.

Data warehousing is similar to central records management!

What is Data Warehouse Architecture?

Data architecture describes how data is stored and used at every step of the data warehousing and business intelligence process from the operational systems to the analytic applications. 

The following diagram depicts a typical data warehouse and business intelligence environment.

data warehouse architecture

Information management solutions are usually involved with some, or all of, the following:
  • Extracting data from operational source systems;
  • Transforming data as required to ensure consistency;
  • Loading data into a data warehouse or some form of storage optimized for efficient storage;
  • Extracting data from the data warehouse;
  • Loading departmental data marts optimized for reporting and analytics; and
  • Some form of reporting tool architecture e.g. universe design or some middle ware design to insulate users from the data warehouse.
Information is a valuable corporate asset and getting the right information, to the right people at the right time is a key management objective.

Achieving this objective requires robust technology and optimal data architecture based on industry best practices.

Data Warehouse Architecture

Data Warehouse
  • Should provide a single data warehouse repository to integrate data across the enterprise;
  • Should store detailed data, along with selective denormalizations, so as to provide extensibility and appropriate performance;
  • Should store historical data and not permit  retroactive adds and deletes;
  • Should be the source for downstream enterprise business intelligence and analytics; and
  • Should not provide direct access to knowledge workers.
Data Marts
  • Should be deployed on a selective basis to meet extreme performance levels for specific analytic applications or to support non-relational file structures;
  • Should contain a subset (usually summarized) of the data from the data warehouse required for a specific application;
  • Should be sourced from an enterprise data warehouse;
  • Should avoid the creation of independent data marts from separate source systems;
  • Should consider virtual data marts, using views on top of data warehouse tables whenever possible; and
  • Should provide direct access to knowledge workers.
Operational Data Stores

Should provide an integrated information environment to support tactical decision-making and operational reporting.
Sandbox
  • Should provide power users the capability of creating their own sandbox up to a pre-defined size without requiring assistance or permission from database administrators or other information technology staff;
  • Should provide power users the capability to load their own data into their sandbox;
  • Should provide the capability to combine sandbox data with production data for specialized analytics;
  • Should provide the capability to restrict sandbox access to only authorized sandbox users;
  • Should ensure that sandbox data is not used to production reporting;
  • Should ensure that sandbox data is not used to provision other reporting data marts; and
  • Should ensure that governance processes purge data per a predefined service level agreement.
Development, Test and Production
  • Should provide separate environments for development, testing, and production;
  • Should have an approved change management process for  hardware and software environments; and
  • Should have appropriate change controls in place, to govern  code migration from one environment to another triggered by the results of test plans combined with management of release schedules.
Data Warehouse Architecture
  • Should segment data warehouse components such as data integration, reporting or analysis
  • Should support the Information Technology subject area  just like any other business function and has similar analytic systems.
  • Should provide an optimal balance between data storage, data integration and data access? e.g. the best structure for data storage is often a poor choice for data access and vice-versa.
  • Should support business requirements
  • Should provide both audit metadata columns and processing metadata columns to facilitate error detection, error resolution, merge/purge activities, archiving, unloading, and change data capture
  • Should be readily available to everyone who needs them
Data Warehouse Architecture Considerations

A typical warehouse might load many millions of records per day. It involves extracting data from many source systems, verifying data quality, cleansing the data in some cases, loading it into staging areas for initial processing, moving it from staging to the warehouse and then moving it into data marts, and sometimes aggregating the data to help reporting performance.

In addition, data must be purged from the warehouse when it reaches its retention period and backup must occur with no interruption to availability or performance.

There are a lot of moving parts and it is very important that overall design consider all significant data integration factors to ensure efficient storage and effective use of the data.

Summary...

Information is a valuable corporate asset and getting the right information, to the right people at the right time is a key management objective.

Achieving this objective requires robust technology and optimal data architecture based on industry best practices.

This site provided information management guidelines for data warehouse architecture.