 |
Data Warehouse Tools
Improve service with David Bowman’s
information management guidelines for data warehouse tools
This
site is designed for Information
Technology professionals who need to improve service and
require guidance and direction for selecting tools.
It provides information management guidelines for data warehouse tools.
Data Warehouse
Architecture
The following diagram depicts a typical data
warehouse and business intelligence environment.
Information management solutions are
usually involved with some, or all of, the following:
- Extracting data from operational source
systems;
- Transforming data as required to ensure
consistency;
- Loading data into a data warehouse or
some form of storage optimized for efficient storage;
- Extracting data from the data warehouse;
- Loading departmental data marts optimized
for reporting and analytics; and
- Some form of reporting tool architecture
e.g. universe design or some middle ware design to insulate users from
the data warehouse.
Data Warehouse tools are required to support data architecture, data
integration, information delivery, and production and operations.
Data Modeling
Tools
- Should capture all artifacts related to
the design of logical and physical data models;
- Should be used to generate the physical
data model from the
logical data model using the tool along with translation rules for
selective denormalization;
- Should use a standard industry logical
data model as a foundation for the modeling efforts e.g. an
enterprise logical data model (ELDM) provides a roadmap for populating
content into the information management environment. Some organizations
have developed their own models and others may have purchased a
commercial model specific for their industry. Companies such as ARDM
Software, Teradata and IBM Global Services have well established
enterprise
logical data models that can provide a jump-start to an ELDM;
- Should provide common warehouse metamodel
format to export meta data from the design tool into
downstream recipients e.g. data movement technology, business
intelligence tools, metadata management tool; and
- Should consider common tools such as
ER/Studio from Embarcadero, ERWin, from CA, PowerDesigner, from SAP, or
Rational Rose Data Modeler, from IBM or other lower cost options, such
as Oracle SQL Developer
Data Modeler, which is a free download.
Performance
Monitoring Tools
- Should provide visibility to workload
executing in the data warehouse
environment;
- Should provide aggregated resource e.g. CPU,
I/O, memory, locks, etc, with drill down to individual
users, queries, etc.
- Should provide historical play-back of
performance trends; and
- Should include common tools such Database
Performance Analyzer, from solarwinds
Data Warehouse Tools for Workload
Management
- Should provide gatekeeper, dynamic
workload prioritization, and
governors to manage workload executing in the EDW
environment;
- Should evaluate the priority and resource
requirements for incoming queries and assess available resources for
execution to decide if a query should be run immediately or
be deferred;
- Should efficiently allocate resources
within the platform according to query demands aligned to defined
service level goals for each query; and
- Should provide governors to
monitor for queries that exceed resource expectations or have
performance issues.
Usage
Analysis
- Should provide automated tracking and
reporting of data warehouse usage by user and application;
- Should track resource consumption;
- Should track detailed tracking down to
the level of specific
tables and columns within the data warehouse; and
- Should track the use
of historical data e.g. detailed data usage helps the database
management team design archiving
strategies and partitioning strategies.
Capacity
Planning
- Should analyze trends in resource
utilization and service level compliance combined with workload
profiling for current and future applications as
input to modeling/simulation tools for capacity planning;
- Should provide what-if
analysis to perform sensitivity analysis across a range of
workload assumptions;
- Should suggest
optimal investment strategies in terms of CPU, memory, disk spindle,
etc. configurations for maximizing performance for each dollar spent.
Impact
Analysis Tools
- Should provide the capability to determine
the impact of any changes to source systems, target database structures
in
the data warehouse, or business transformation rules;
- Should
provide a
listing of all extract programs, transformation rules, target table
structures, and business intelligence tool impacts resulting from a
change e.g. a column
length
increases from six bytes to twelve bytes; and
- Should typically be provided by the data
integration technology e.g Informatica.
Source
Code Management
- Should manage all source code through a
centralized repository with
versioning, change tracking, and automated build processes;
- Should include DDL for table and view
construction, Unix scripts, autosys or Control M scripts, data
integration and business intelligence code and documentation;
- Should provide the capability to rollback
if a significant defect caused by a recent code
modification is discovered; and
- Should consider common tools such
Informatica PowerCenter Version Control, from Informatica, Serena
Dimensions CM, from Serena or Microsoft Team Foundation Server, from
MicroSoft.
Configuration
Management Database
- Should provide the capability to catalog
all infrastructure configuration items in a
centralized repository;
- Should provide access to development and
production support teams;
- Should maintain details about specific
hardware models, documentation versions, software versions,
patch levels, etc;
- Should track acquisition and maintenance
costs; and
- Should consider common tools such
ServiceNow Configuration Management, from ServiceNow.
Incident
and Problem Tracking
- Should log all incidents in a logging tool with
classification and a description of the incident;
- Should provide search capabilities so that new
incidents can be easily linked and tracked along with existing and
previous incidents;
- Should provide the capability to report patterns of
repeated root causes; and
- Should consider common tools such Service Desk
Manager, from CA Technologies.
Defect
and Change Request Tracking
- Should provide the capability to document
and track all defects and change requests;
- Should provide the capability to document
and track all production incidents;
- Should provide the capability to review and
categorize change requests to ensure completeness of the
documentation e.g. defects should not be closed without adequate
documentation of root cause; and
- Should consider common tools such as
ServiceDesk Plus, from ManageEngine.
Data
Quality Management
- Should provide the capability to automate
data profiling;
- Should provide the capability to measure data quality
defects;
- Should provide the capability to create
alerts when data quality does not meet requirements;
- Should provide the capability to report
on trends;
- Should provide the capability to track
defects versus target service level
agreements; and
- Should consider common tools such as
Informatica Data Quality, from Informatica.
Quality Assurance Tools
- Should support software-testing;
- Should provide the capability to create test cases
and scripts;
- Should provide the capability to initiate of software
tests;
- Should support test status reporting ;
- Should support test defect management and reporting;
and
- Should consider common tools such as HP Quality
Center.
Summary...
Data Warehouse tools are required to support data
architecture, data
integration, information delivery, and production and operations.
This site provide information management guidelines for data warehouse
tools.
|
|