logo for information-management-architect.com
leftimage for information-management-architect.com

Data Warehouse Requirements

Improve requirements gathering process with David Bowman’s information management guidelines for data warehouse requirements

This site is designed for Information Technology professionals who need to improve effectiveness and require guidance and direction to help teams consistently produce error free results.

It provides a checklist of information management guidelines for specifying data warehouse requirements.

Why is this important?

Information management, or data warehousing projects, are data centric. This means that most of the requirements will be data related and require highly skilled specialists early in the project, to define requirements, and thru-out the project, to ensure requirements traceability and satisfaction of requirements.

Clearly defined data warehouse requirements will go a long way towards ensuring project success.

Design Constraints
  • Should specify legal and regulatory policies;
  • Should specify any requirements mandating the use of specific existing applications as data source;
  • Should specify any requirements to comprehend current business organizational structure;
  • Should specify any hardware limitations e.g.restrictions on the hardware configurations on which the system must be capable of functioning, e.g. platforms, networks, and storage devices;
  • Should specify any software limitations e.g.operating systems and databases;
  • Should specify any communications limitations e.g. network components, communications protocols, transmission formats, etc;
  • Should specify any architecture and technology standards;
  • Should specify production assurance certification standards;
  • Should specify data modeling standards e.g. if the system is required to be compliant with a corporate or specific data model;
  • Should specify operational environment standard;
  • Should specify requirements governing the geographical region in which the system must operate exist, e.g. USA, Europe or South America;
  • Should specify requirements governing the operating environment in which the system must function, e.g. data center, office, manufacturing site;
  • Should specify any requirements governing the physical environment in which the system must function, e.g. extreme temperature, high humidity; and
  • Should specify any requirements governing the physical portability required of the system, e.g. from hotels, airports, customer sites.
Data Warehouse Requirements for Performance
  • Should be quantifiable measures of the functioning of the system. As such, terminology such as ‘immediately’ or ‘as soon as possible’ is inappropriate;
  • Should be stated in terms of the end user’s experience e.g. it is not necessary to specify the performance characteristics of components with which the user does not interact;
  • Should specify requirements dealing with functions being available when needed, e.g. ad hoc reporting;
  • Should specify requirements for synchronization of databases;
  • Should specify requirements governing the replication of data across distributed platforms;
  • Should specify any constraints on acceptable response time e.g., within a certain number of seconds for a specific percentage of transactions, in general, or for specific functions or inquiry versus update;
  • Should specify response time requirements for specific queries or classes of queries;
  • Should specify any requirements that differ based upon characteristics of the data e.g. if queries of three-year-old data do not have the same performance requirements as queries of current data, state the two requirements separately;
  • Should specify constraints imposed due to peak and off-peak operation;
  • Should specify additional constraints imposed due to maximum and minimum volume of data;
  • Should specify any additional constraints imposed due to maximum or minimum frequency of access;
  • Should specify what abilities the system should have when an overload begins to occur;
  • Should specify the allowable degradation of service;
  • Should specify backup criteria to minimize overloading;
  • Should specify requirements around acceptable levels of performance for average, peak or burst load on the system e.g. load levels may be defined as average, which is the normal amount of activity handled by the system over a given period; peak, which is the maximum activity handled by the system at some period during the day or burst; which is the amount of activity that the system may have to handle following an abnormal event e.g. system comes back after being unavailable during peak usage period and a more than peak number of concurrent users login to the system within a one minute period;
  • Should specify maximum number of concurrent users, including support staff;
  • Should specify minimum number of transactions-per-time period to be supported by the system e.g. transactions are desired system capabilities as outlined in the business and functional requirements specification and there should be one requirement per transaction; 
  • Should specify minimum number of named/connected/active users to be supported by the system e.g. state by time of day, week, month or year if applicable;
  • Should specify average number of customer service support representatives;
  • Should specify maximum number of customer service representatives;
  • Should specify total number of customers in the system;   
  • Should specify maximum number of unique customer sessions per day e.g. how many unique customers will use system in a day;
  • Should specify total number of inquiry transactions processed in a day;
  • Should specify total number of update transactions processed in a day;
  • Should specify number of transactions that are submitted to external entity in a day;
  • Should specify minimum number of inquiry transactions per user session;
  • Should specify maximum number of inquiry transactions per user session;
  • Should specify minimum number of update transactions per user session;
  • Should specify maximum number of update transactions per user session;
  • Should specify if the system needs to measure and record performance metrics at a system component level for performance monitoring and/or reporting;
  • Should specify if there are reporting performance requirements in terms of report availability at a specific time or relative to a known event e.g. "The monthly summary report should be available for viewing/printing no later than eight hours after end-of-month processing completes”
Data Warehouse Requirements for Quality and Production Support

Should identify requirements necessary to ensure that the system is properly operated, monitored, and supported.
Ability to Audit
  • Should specify requirements to monitor transactions through the system for the purposes of auditing, error diagnosis, and performance management;
  • Should specify if there is information, which is required by external or internal regulatory entities, to be maintained for audit purposes;
  • Should specify what type of data access is required to information maintained for audit purposes e.g. online, hard copy report or electronic media;
  • Should specify the anticipated frequency of such audits;
  • Should specify the data retention requirements for information maintained for audit purposes e.g. online, hard copy report or electronic media;
  • Should specify if an audit log is required to track a transaction through the various states;
  • Should specify if an audit log needs to be configured to identify problem areas; and
  • Should specify if users need to notified if an error is detected.
Reliability
  • Should specify how dependable the system should be e.g. this includes accuracy of the processing, robustness (uptime) and consistency of performance, which is normally measured as “mean time between failures.”
  • Should specify how long the system should function without downtime;
  • Should specify how often maintenance can be scheduled;
  • Should specify if the system should produce the same results consistently under different operating conditions;
  • Should specify if the system will deliver the same data and same results to all the channels under any type of system conditions;
  • Should specify if any transaction failure might result in irrecoverable data loss; and
  • Should specify if all failures will report back relevant error messages to the user with clear instructions on recovery methods and next steps.
Availability
  • Should specify the amount of time, during normal use periods, that the system must be available (normally expressed as a percentage of the amount of time actually available over the amount of possible time available) during agreed upon service hours;
  • Should specify if there are specific hours of operation required of the system;
  • Should specify if there are  targets for amount of planned downtime on the system e.g. for maintenance;
  • Should specify if there are targets for amount of unplanned downtime on the system failure;
  • Should specify if there are specific targets for mean time between failures; and
  • Should specify if there are specific targets for acceptable recovery time after a failure e.g. the system must be at least 96% available overall.
Data Currency
  • Should define the currency of the data compared to the source, as required by the functionality of the system; and
  • Should specify if there are constraints governing the maximum allowable latency for data.
Flexibility
  • Should specify requirements related to the degree to which the system can be changed to include new business functions and technologies in the future;
  • Should specify if the application needs to be extensible in order to address future functionality and changes without having to be completely rewritten; and
  • Should specify if architecture needs to be innovative and flexible enough to accommodate related technological changes that could be leveraged in the future.
Accuracy
  • Should specify any requirements governing the editing of specific data attribute values;
  • Should specify requirements governing allowed-values for specific data attributes;
  • Should specify if there are data attributes that must be cross-edited with other data attribute values;
  • Should specify if there are data attributes that must be verified against external data sources; and
  • Should specify if there are data attributes that must be verified against internal data sources.
Consistency
  • Should specify if there are there specific constraints on GUI screen templates, including color, fonts, headers, footers, logos, disclaimers, etc;
  • Should specify if there are specific constraints on report templates, including fonts, headers, footers, disclaimers, etc;
  • Should specify if there are specific constraints on screen navigation;
  • Should specify if there are specific constraints on GUI “button” functions;
  • Should specify if there are specific constraints on consistency of response time e.g. in general, or across specific functions; and
  • Should specify if there are specific constraints on data access.
Maintainability
  • Should specify types and levels of maintenance;
  • Should specify how will updates be distributed;
  • Should specify what portions of the business requirements are expected to change most often; and
  • Should specify if user managed tables or parameters will be allowed, helpful or prohibited.
Data Warehouse Requirements for Scalability

Scalability is the ability to expand the system architecture to accommodate more users, more transactions and more data as additional users and data are added in the future. The existing systems are extended as far as possible without necessarily having to replace them.

This requirement directly affects the architecture as well as the selection of hardware and system software components.

Scalability requirements should specify if the application requires scalability across these three domains e.g. hardware, software or applications
Current loads, anticipated loads over the next 2-4 years as projected by the capacity metrics, as well as additional functionality that are to be added in future phases as outlined in the use case document should be supported by the architecture.

Appropriate hardware upgrades and addition of necessary hardware for horizontal scaling is acceptable.

Significant software, application or database changes should not be required to scale.

Data Warehouse Requirements for Monitoring
  • Should specify requirements for information, required by the system operator, to assist in monitoring the routine performance of the system;
  • Should specify requirements for type of data access required for performance monitoring data;
  • Should specify requirements for anticipated frequency of such monitoring;
  • Should specify requirements for data retention requirements for information maintained for system monitoring e.g. online, hard copy report or electronic media;
  • Should specify requirements for information, required by the operational user, to assist in answering inquiries against system transactions, e.g. error logging, transmission timestamps, etc;
  • Should specify requirements for information, required by the operational user, to assist in answering inquiries about the type or frequency of system access;
  • Should specify requirements for information, required by the operational user, to assist in answering inquiries about the type or frequency of system function utilization;
  • Should specify requirements for communicating overload situations to both the staff and customers; and
  • Should specify requirements for any analysis and reporting to be accomplished following overload situations.
Command and Control
  • Should specify architecture needs to provide overall runtime environment control functionality;
  • Should specify requirements for the ability to start and stop individual components and the overall environment;
  • Should specify architecture needs to provide services in support of load balancing;
  • Should specify if there is a need to control the system remotely;
  • Should specify if there is a need to control specific interfaces;
  • Should specify if there is a need to off load processes to backup devices during high peak traffic events e.g. backup, shared or processing; and
  • Should specify if there is a need to interrupt processes or devices.
Data Warehouse Requirements for Failure Management
  • Should specify requirements of the system when it encounters a failure e.g. software, hardware or connectivity, networks, routers, switches, etc;
  • Should specify if failures should be detected both automatically or manually;
  • Should specify how failures should be reported;
  • Should specify if alarm messages should be sent;
  • Should specify if system logs should be kept;
  • Should specify if a notification system is required;
  • Should specify what diagnostic analyzes or reports are required;
  • Should specify what failure recovery methods are required;
  • Should specify if failure recovery methods should be automated; and
  • Should specify the minimum time to recovery.
Logging
  • Should specify any standardized services for general logging functionality;
  • Should specify if data logs are required;
  • Should specify what data should be captured and logged;
  • Should specify the retention period for data logs; and
  • Should specify if access to these logs will be restricted.
Error Handling
  • Should specify the data required to perform diagnostic analysis to determine the problem source without the need to recreate the problem;
  • Should specify specific data that must be captured in order to determine the problem source;
  • Should specify components that must have error protection;
  • Should specify how the system administrator will be notified of system errors e.g. messages, logs or other communication;
  • Should specify if there are special diagnostic tools that must be used; and
  • Should specify if the system will automatically react to error conditions e.g. shut down, reboot, move to a degraded mode.
Data Migration or Conversion
  • Should specify data migration or data conversion requirements;
  • Should specify which broad classes of data must be converted;
  • Should specify which data should be excluded from the conversion based on its age, or other record-level attributes;
  • Should specify what  audit trails are required for the conversion process; and
  • Should specify what linkages  must be maintained between the converted data and the original data.
Data Warehouse Requirements for Security
  • Should specify requirements for LDAP, or an equivalent standard, to manage a centralized directory of security and authorization information used by the business intelligence environment e.g database level and applications level, as well as other IT infrastructure within the enterprise;
  • Should specify requirements for role-based security be used so that individuals have access rights in alignment with their job functions;
  • Should specify requirements for row and column level security for all data warehouse database platforms; and
  • Should specify requirements for encryption on anything that is sensitive or governed by regulations such as HIPPA.
Business Continuity Planning
  • Should identify the level of contingency planning for the system; and
  • Should specify the maximum allowable downtime for the system following a disaster e.g. tornado, hurricane, flood, fire, building collapse, etc.
Backup and Recovery Requirements
  • Should specify backup and recovery requirements to ensure that recovery from failures is timely and free of any significant data loss so as to support availability requirements;
  • Should specify forms of transportable media for system data back ups;
  • Should specify how frequently and when the system data should be backed up; and
  • Should specify if critical system data should be backed up more frequently.
Data Warehouse Requirements for Disaster Recovery
  • Should specify disaster recovery requirements so that mission critical systems have plans to recover from natural disasters within a reasonable amount of time;
  • Should specify any needs to integrate disaster recovery plans with existing plans supported by the enterprise;
  • Should specify the disaster recovery time required; and
  • Should specify if there a disaster recovery plan in place to support this recovery period.
Training Requirements

Should specify training requirements to capture what end-user and production support personnel will need to learn to use or operate the system when it is implemented.

Configuration Management

Should identify special requirements for version control of specific work products generated by the project. Normally, the project/application Software Configuration Management Plan should contain all details for configuration management.

Data Warehouse Requirements Specification Template

Under Construction  

Summary…

Information management projects are data centric. Most of the requirements are data related and require highly skilled specialists early in the project, to define requirements, and thru-out the project, to ensure requirements traceability and satisfaction of data warehouse requirements.

This site provided a checklist of information management guidelines for specifying data warehouse requirements.