logo for information-management-architect.com
leftimage for information-management-architect.com

Data Warehouse Tools

Improve service with David Bowman’s information management guidelines for data warehouse tools

This site is designed for Information Technology professionals who need to improve service and require guidance and direction for selecting tools.

It provides information management guidelines for data warehouse tools.

Data Warehouse Architecture

The following diagram depicts a typical data warehouse and business intelligence environment.

data warehouse architecture

Information management solutions are usually involved with some, or all of, the following:
  • Extracting data from operational source systems;
  • Transforming data as required to ensure consistency;
  • Loading data into a data warehouse or some form of storage optimized for efficient storage;
  • Extracting data from the data warehouse;
  • Loading departmental data marts optimized for reporting and analytics; and
  • Some form of reporting tool architecture e.g. universe design or some middle ware design to insulate users from the data warehouse.
Data Warehouse tools are required to support data architecture, data integration, information delivery, and production and operations.

Data Modeling Tools
  • Should capture all artifacts related to the design of logical and physical data models;
  • Should be used to generate the physical data model from the logical data model using the tool along with translation rules for selective denormalization;
  • Should use a standard industry logical data model as a foundation for the modeling efforts e.g. an enterprise logical data model (ELDM) provides a roadmap for populating content into the information management environment. Some organizations have developed their own models and others may have purchased a commercial model specific for their industry. Companies such as ARDM Software, Teradata and IBM Global Services have well established enterprise logical data models that can provide a jump-start to an ELDM;
  • Should provide common warehouse metamodel format to export meta data from the design tool into downstream recipients e.g. data movement technology, business intelligence tools, metadata management tool; and
  • Should consider common tools such as ER/Studio from Embarcadero, ERWin, from CA, PowerDesigner, from SAP, or Rational Rose Data Modeler, from IBM or other lower cost options, such as Oracle SQL Developer Data Modeler, which is a free download.
Performance Monitoring Tools
  • Should provide visibility to workload executing in the data warehouse environment;
  • Should provide aggregated resource e.g. CPU, I/O, memory, locks, etc, with drill down to individual users, queries, etc. 
  • Should provide historical play-back of performance trends; and
  • Should include common tools such Database Performance Analyzer, from solarwinds
Data Warehouse Tools for Workload Management
  • Should provide gatekeeper, dynamic workload prioritization, and governors to manage workload executing in the EDW environment;
  • Should evaluate the priority and resource requirements for incoming queries and assess available resources for execution to decide if a query should be run immediately or be deferred;
  • Should efficiently allocate resources within the platform according to query demands aligned to defined service level goals for each query; and
  • Should provide governors to monitor for queries that exceed resource expectations or have performance issues.
Usage Analysis
  • Should provide automated tracking and reporting of data warehouse usage by user and application;
  • Should track resource consumption;
  • Should track detailed tracking down to the level of specific tables and columns within the data warehouse; and
  • Should track the use of historical data e.g. detailed data usage helps the database management team design archiving strategies and partitioning strategies.
Capacity Planning
  • Should analyze trends in resource utilization and service level compliance combined with workload profiling for current and future applications as input to modeling/simulation tools for capacity planning;
  • Should provide what-if analysis to perform sensitivity analysis across a range of workload assumptions;
  • Should suggest optimal investment strategies in terms of CPU, memory, disk spindle, etc. configurations for maximizing performance for each dollar spent.
Impact Analysis Tools
  • Should provide the capability to determine the impact of any changes to source systems, target database structures in the data warehouse, or business transformation rules;
  • Should provide a listing of all extract programs, transformation rules, target table structures, and business intelligence tool impacts resulting from a change e.g. a column length  increases from six bytes to twelve bytes; and
  • Should typically be provided by the data integration technology e.g Informatica.
Source Code Management
  • Should manage all source code through a centralized repository with versioning, change tracking, and automated build processes;
  • Should include DDL for table and view construction, Unix scripts, autosys or Control M scripts, data integration and business intelligence code and documentation;
  • Should provide the capability to rollback if a significant defect caused by a recent code modification is discovered; and
  • Should consider common tools such Informatica PowerCenter Version Control, from Informatica, Serena Dimensions CM, from Serena or Microsoft Team Foundation Server, from MicroSoft.
Configuration Management Database
  • Should provide the capability to catalog all infrastructure configuration items in a centralized repository;
  • Should provide access to development and production support teams;
  • Should maintain details about specific hardware models, documentation versions, software versions, patch levels, etc;
  • Should track acquisition and maintenance costs; and
  • Should consider common tools such ServiceNow Configuration Management, from ServiceNow.
Incident and Problem Tracking
  • Should log all incidents in a logging tool with classification and a description of the incident;
  • Should provide search capabilities so that new incidents can be easily linked and tracked along with existing and previous incidents;
  • Should provide the capability to report patterns of repeated root causes; and
  • Should consider common tools such Service Desk Manager, from CA Technologies.
Defect and Change Request Tracking
  • Should provide the capability to document and track all defects and change requests;
  • Should provide the capability to document and track all production incidents;
  • Should provide the capability to review and categorize change requests to ensure completeness of the documentation e.g. defects should not be closed without adequate documentation of root cause; and
  • Should consider common tools such as ServiceDesk Plus, from ManageEngine.
Data Quality Management
  • Should provide the capability to automate data profiling;
  • Should provide the capability to measure data quality defects;
  • Should provide the capability to create alerts when data quality does not meet requirements;
  • Should provide the capability to report on trends;
  • Should provide the capability to track defects versus target service level agreements; and
  • Should consider common tools such as Informatica Data Quality, from Informatica.
Quality Assurance Tools
  • Should support software-testing;
  • Should provide the capability to create test cases and scripts;
  • Should provide the capability to initiate of software tests;
  • Should support test status reporting ;
  • Should support test defect management and reporting; and
  • Should consider common tools such as HP Quality Center.
Summary...

Data Warehouse tools are required to support data architecture, data integration, information delivery, and production and operations.

This site provide information management guidelines for data warehouse tools.