logo for information-management-architect.com
leftimage for information-management-architect.com

Data Warehouse Infrastructure

Improve service with David Bowman’s information management guidelines for data warehouse infrastructure

This site is designed for Information Technology professionals who need to improve service and require guidance and direction to help teams consistently produce error free results.

It provides information management guidelines for data warehouse infrastructure

What is a Data Warehouse?

The following diagram depicts a typical data warehouse and business intelligence environment.

data warehouse architecture

Data warehouse solutions are usually involved with some, or all of, the following:
  • Extracting data from operational source systems;
  • Transforming data as required to ensure consistency;
  • Loading data into a data warehouse or some form of storage optimized for efficient storage;
  • Extracting data from the data warehouse;
  • Loading departmental data marts optimized for reporting and analytics; and
  • Some form of reporting tool architecture e.g. universe design or some middle ware design to insulate users from the data warehouse.
Data Warehouse infrastructure is required to support data architecture, data integration, information delivery, and production and operations.

Relational Database Platform
  • Should provide scalable hardware using an ANSI SQL compliant relational database;
  • Should provide separate systems for development, test and production;
  • Should provide the capability to implement change without impacting the production environment;
  • Should provide a test environment to test new versions of software before promotion into the production environment; and
  • Should consider common relational database management systems such as DB2, from IBM, Oracle, Teradata, SQL Server, from Microsoft, and other specialized systems such as Aster Database, from Teradata, SQL Server Parallel Data Warehouse, from Microsoft, Sybase IQ, from SAP, Appache Cassandra, Greenplum, from Pivotal and Hadoop or Hortonworks.
Business intelligence Infrastructure
  • Should support business intelligence solution design;
  • Should support business analytic requirements;
  • Should support business intelligence reporting;
  • Should support decision support requirements;
  • Should provide knowledge workers the capability of direct access to detailed data using a standardized tool suite;
  • Should not require intervention by Information Technology professionals;
  • Should provide a semantic metadata layer to allow end users to have a business view of the data which is translated for access into the underlying physical database structure by the tool;
  • Should consider allowing power users direct access to data using more advanced tools;
  • Should implement security through a combination of data and application level access controls;
  • Should consider common business intelligence tools such as Business Objects, from SAP, Cognos, from IBM, Enterprise Guide, from SAS, MicroStrategy, Oracle Business Intelligence Enterprise Edition, from Oracle, SQL Server Analysis Services, from Microsoft, and WebFocus, from IBM;
  • Should consider common data visualization and discovery tools such as Visual Analytics, from SAS, Domo, QlikView, from QlikTech, Spotfire, from TIBCO and Tableau; and
  • Should consider common data mining tools such as Enterprise Miner, from SAS, KXEN, from SAP and SPSS, from IBM.
Data Integration Infrastructure
  • Should capture all business rules for data transformation in the ETL tool meta data repository;
  • Should provide the capability to develop code required to extract data form source systems, transform per business transformation rules and load it into appropriate data warehouse storage environments;
  • Should provide the capability to schedule data extraction from source systems and transform and load it into appropriate staging areas, data warehouses and data marts;
  • Should provide repositories to store data integration software and common routines;
  • Should provide the capability to perform impact analysis; and
  • Should consider common data integration tools such Ab Initio, PowerCenter, from Informatica, Ascential DataStage, from IBM, SQL Server Integration Services, from Microsoft, Business Objects Data Services, from SAP, Data Integration Studio, from SAS and Pentaho, or Open Studio for Data Integration, from Talend.
Metadata Management Infrastructure
  • Should provide the capability to store design metadata;
  • Should provide the capability to store business metadata;
  • Should provide the capability to store technical metadata;
  • Should provide the capability to store operational metadata;
  • Should provide the capability to store semantic layer meta data ;
  • Should provide the capability to exchange metadata using common warehouse metamodel standards;
  • Should provide the capability to provide visibility to data currency, data quality, data definitions, and data lineage;
  • Should provide the capability for knowledge workers to browse metadata; and
  • Should consider common metadata management tools such as Informatica or Rochade, from ASG Software Solutions
Master Data Management Tools
  • Should provide a consistent framework of tools across multiple subject areas;
  • Should provide the capability to synchronize a master data of record repository with transactional source systems;
  • Should provide the capability to feed a master data reference repository;
  • Should maintain the golden data for all updates;
  • Should maintain a history of changes; and
  • Should consider common master data management tools such as Collaborative Information Manager, from TIBCO, Informatica MDM Hub, from Informatica, InfoSphere MDM, from IBM, Master Data Services, from Microsoft, Master Data Maestro, from Profisee, qMDM, from Dataflux/SAS, Netweaver Master Data Management, from SAP or Teradata MDM, from Teradata.
Summary...

Data Warehouse infrastructure is required to support data architecture, data integration, information delivery, and production and operations.

This site provide information management guidelines for data warehouse infrastructure