logo for information-management-architect.com
leftimage for information-management-architect.com

Data Warehouse Best Practices

Improve service with David Bowman’s information management guidelines for data warehouse best practices

This site is designed for Information Technology professionals who need to improve service and want guidance and direction to help teams consistently produce error free results.

It provides information management guidelines for data warehouse best practices.

Data Warehouse Architecture

The following diagram depicts a typical data warehouse and business intelligence environment.

data warehouse architecture

Information management solutions are usually involved with some, or all of, the following:
  • Extracting data from operational source systems;
  • Transforming data as required to ensure consistency;
  • Loading data into a data warehouse or some form of storage optimized for efficient storage;
  • Extracting data from the data warehouse;
  • Loading departmental data marts optimized for reporting and analytics; and
  • Some form of reporting tool architecture e.g. universe design or some middle ware design to insulate users from the data warehouse.
Information is a valuable corporate asset and getting the right information, to the right people at the right time is a key management objective.

Achieving this objective requires robust technology and optimal data architecture based on industry best practices.

Data Warehouse Best Practices

Performance Management Guidelines
  • Should specify expectations for dealing with functions being available when needed, e.g., ad hoc reporting;
  • Should specify expectations for synchronization of databases;
  • Should specify expectations governing the replication of data/function across distributed platforms;
  • Should specify expectations on acceptable response time e.g. within a certain number of seconds for a specific percentage of transactions;
  • Should specify expectations for response time for specific queries or classes of queries;
  • Should specify expectations for response times that differ based upon characteristics of the data e.g. if queries of three-year-old data do not have the same performance requirements as queries of current data, the two expectations should be stated separately;
  • Should specify expectations for constraints imposed due to peak/off-peak operation;
  • Should specify expectations for maximum/minimum volume of data;
  • Should specify expectations for what abilities the system should have when an overload begins to occur;
  • Should specify expectations for the allowable degradation of service;
  • Should specify expectations for backup criteria to minimize overloading;
  • Should specify expectations for acceptable levels of performance for average, peak or burst load on the system e.g. load levels may be defined as follows: average, the normal amount of activity handled by the system over a given period; peak, the maximum activity handled by the system at some period during the day; or burst, the amount of activity that the system may have to handle following an abnormal event e.g. system comes back after being unavailable during peak usage period and a more than peak number of concurrent users login to the system within a one minute period;
  • Should specify expectations for maximum number of concurrent users;
  • Should specify expectations for minimum number of named/connected/active users to be supported by the system;
  • Should specify expectations for average number of customer service support representatives;
  • Should specify expectations for maximum number of customer service representatives;    
  • Should specify expectations for total number of customers in the system;
  • Should specify expectations for maximum number of unique customer sessions per day;
  • Should specify expectations for total number of inquiry transactions processed in a day;
  • Should specify expectations for total number of update transactions processed in a day;
  • Should specify expectations for number of transactions that are submitted to external entities in a day;
  • Should specify expectations for minimum number of inquiry transactions per user session;
  • Should specify expectations for maximum number of inquiry transactions per user session;
  • Should specify expectations for minimum number of update transactions per user session;
  • Should specify expectations for maximum number of update transactions per user session;
  • Should specify expectations if the system needs to measure and record performance metrics at a system component level for performance monitoring and/or reporting; and
  • Should specify expectations for reporting performance metrics e.g these should be stated in terms of report availability at a specific time or relative to a known event e.g.  “The monthly summary report should be available for viewing/printing no later than eight hours after end-of-month processing completes”
Data Warehouse Best Practices for Quality and Production Support

Access Auditing
  • Should specify expectations for monitoring transactions through the system for the purposes of auditing, error diagnosis, and performance management;
  • Should specify expectations for information, required by external or internal regulatory entities, to be maintained for audit purposes;
  • Should specify expectations for types of data audit access is required;
  • Should specify expectations for anticipated frequency of such audits;
  • Should specify expectations for audit data retention;
  • Should specify expectations for audit logs; and
  • Should specify expectations for user notifications of errors.
Reliability
  • Should specify expectations for how dependable the system will be e.g. accuracy of processing, robustness, uptime, and consistency of performance, which is normally measured as “mean time between failures.”
  • Should specify expectations for how long the system should the function without downtime;
  • Should specify expectations for how often maintenance can be scheduled;
  • Should specify expectations for the system to produce the same results consistently under different operating conditions;
  • Should specify expectations for the delivery of the same data and same results to all the channels under any type of system conditions;
  • Should specify expectations if any transaction failure will result in irrecoverable data loss; and
  • Should specify expectations if all failures will report back relevant error messages to the user with clear instructions on recovery methods and next steps.
Data Warehouse Best Practices for Availability
  • Should specify expectations for the amount of time, during normal use periods, that the system must be available, normally expressed as a percentage of the amount of time actually available over the amount of possible time available, during agreed upon service hours e.g since the system constitutes software (including applications), hardware, and network components, this requirement extends to all three of them;
  • Should specify expectations for specific hours of operation;
  • Should specify expectations for targets for amount of planned downtime;
  • Should specify expectations for the amount of unplanned downtime;
  • Should specify expectations for specific targets for mean time between failures;
  • Should specify expectations for specific targets for acceptable recovery time after a failure e.g.the system must be at least 96% available overall;
  • Should consider the data warehouse environment as a mission critical application with availability 7X24;
  • Should ensure that data loading, backups, and most operations are performed “on-line” to preserve access to the analytic environment; and
  • Should ensure planned downtime is taken rarely and should usually be related to capacity and/or software upgrades.
Data Currency
  • Should specify expectations for currency of the data compared to the source, as required by the functionality of the system; and
  • Should specify expectations for constraints governing the maximum allowable latency for data.
Accuracy
  • Should specify expectations for governing the editing of specific data attribute values;
  • Should specify expectations for governing allowed-values for specific data attributes;
  • Should specify expectations for data attributes that must be cross-edited with other data attribute values;
  • Should specify expectations for data attributes that must be verified against external data sources; and
  • Should specify expectations for data attributes that must be verified against internal data sources.
Maintainability
  • Should specify expectations for types and levels of maintenance; and
  • Should specify expectations for how updates will be distributed.
Data Warehouse Best Practices for Monitoring
  • Should specify expectations for information, required by the system operator, to assist in monitoring the routine performance of the system;
  • Should specify expectations for type of data access required for performance monitoring data;
  • Should specify expectations for anticipated frequency of such monitoring;
  • Should specify expectations for data retention requirements for information maintained for system monitoring e.g. online, hardcopy report or electronic media;
  • Should specify expectations for information, required by the operational user, to assist in answering inquiries against system transactions, e.g. error logging, transmission timestamps, etc;
  • Should specify expectations for information, required by the operational user, to assist in answering inquiries about the type or frequency of system access;
  • Should specify expectations for information, required by the operational user, to assist in answering inquiries about the type or frequency of system function utilization;
  • Should specify expectations for communicating overload situations to both the staff and customers; and
  • Should specify expectations for any analysis and reporting to be accomplished following overload situations.
Data Warehouse Best Practices for Failure Management
  • Should specify expectations of the system when it encounters a failure e.g. software, hardware or connectivity, networks, routers, switches, etc;
  • Should specify if failures should be detected both automatically or manually;
  • Should specify how failures will be reported;
  • Should specify how alarm messages will be sent;
  • Should specify how system logs will be kept;
  • Should specify what diagnostic analyzes or reports should be used;
  • Should specify what failure recovery methods should be used;
  • Should specify how failure recovery methods will be automated; and
  • Should specify minimum time to recovery.
Data Warehouse Best Practices for Logging
  • Should specify expectations for standardized services for general logging functionality;
  • Should specify expectations for which data logs will be required;
  • Should specify expectations for what data should be captured and logged;
  • Should specify expectations for the retention period for data logs; and
  • Should specify expectations for how access to these logs will be restricted.
Data Warehouse Best Practices for Error Handling
  • Should specify expectations for any data required to perform diagnostic analysis to determine the problem source without the need to recreate the problem;
  • Should specify expectations for any specific data that must be captured in order to determine the problem source;
  • Should specify expectations for components that must have error protection;
  • Should specify expectations for how the user/system administrator should be notified of system errors e.g. messages, logs or other communication;
  • Should specify expectations for any special diagnostic tools that must be used; and
  • Should specify expectations how, or if, the system should automatically react to error conditions e.g. shut down, reboot, move to a degraded mode.
Data Warehouse Best Practices for Security
  • Should use LDAP, or an equivalent standard, to manage a centralized directory of security and authorization information used by the business intelligence environment, database level and applications level, as well as other Information Technology  infrastructure within the enterprise;
  • Should use role-based security so that individuals have access rights in alignment with their job functions, rather than to all data;
  • Should have row and column level security for all data warehouse database platforms; and
  • Should encrypt anything that is sensitive or governed by regulations such as HIPPA.
Business Continuity Planning
  • Should identify the level of contingency planning for the system; and
  • Should specify expectations for the maximum allowable downtime for the system following a disaster e.g. tornado, hurricane, flood, fire, building collapse, etc.
Data Warehouse Best Practices for Backup and Recovery
  • Should specify expectations for backup and recovery service levels to ensure that recovery from failures is timely and free of any significant data loss so as to support availability expectations;
  • Should specify expectations for any specific form of transportable media for system data back ups;
  • Should specify expectations for how frequently and when the system data should be backed up; and
  • Should specify expectations for how critical system data should be backed up more frequently.
Disaster Recovery
  • Should specify expectations for disaster recovery service levels so that mission critical systems have plans to recover from natural disasters within a reasonable amount of time; and
  • Should specify expectations for the disaster recovery time required.
Data Warehouse Best Practices for Service Level Agreements
  • Should be documented in detail for performance, availability, data currency, and data quality;
  • Should be negotiated with business users and should be used as the basis for capacity planning and architecture considerations;
  • Should be used to measure performance of actual service level delivery; and
  • Should be reviewed with business on a quarterly basis.
Summary...

Information is a valuable corporate asset and getting the right information, to the right people at the right time is a key management objective.

Achieving this objective requires robust technology and optimal data architecture based on industry best practices.

This site provided information management guidelines for data warehouse best practices.