<
logo for information-management-architect.com
leftimage for information-management-architect.com

Data Integration

Improve roadmap planning process with David Bowman’s information management guidelines for data integration assessment

This site is designed for Information Technology professionals, solution architects and data architects who want to improve data roadmap planning process.

It introduces data integration assessment and provides a checklist of information management guidelines for each assessment task.

What is Data Movement?

Data movement involves obtaining and storing data, and the architecture and processes required to support those activities. Data integration e.g. extract, transform and load, change data capture, and cleansing tools are commonly used in support of data movement.

The following diagram depicts a typical data warehouse and business intelligence environment.

data warehouse architecture

Information management solutions are usually involved with some, or all of, the following:
  • Extracting data from operational source systems;
  • Transforming data as required to ensure consistency;
  • Loading data into a data warehouse or some form of storage optimized for efficient storage;
  • Extracting data from the data warehouse;
  • Loading departmental data marts optimized for reporting and analytics; and
  • Some form of reporting tool architecture e.g. universe design or some middle ware design to insulate users from the data warehouse.
Information is a valuable corporate asset and getting the right information, to the right people at the right time is a key management objective.

Achieving this objective requires robust technology and optimal data architecture based on industry best practices.

What is Roadmap Planning?

The objective of an assessment is to review current situation technology, architecture, methodology and organizational readiness and compare this with industry best practices to determine what, if any, changes are required to achieve information management objectives.

The plan to achieve these changes is called a roadmap and the tasks involved with creating this plan are called roadmap planning.

Data Movement Roadmap Planning Consulting Firms

Organizations frequently engage Information Technology management consulting firms to help with roadmap planning. They usually provide a team which includes a solution architect or management consultant with extensive experience in data warehouse or business intelligence planning, architecture and delivery. These resources are highly skilled and bring many years of successful architecture experience.

A solution architect must be supported by a set of well established best practices that are typically only available from  Management Consulting firms that specialize in information management and data warehouse solutions.

Consulting firms such as Deloitte Consulting, Ernst & Young, KPMG and  Accenture have evolved data warehouse best practices and usually have the internal expertise required to assess data architecture and complete roadmap planning.  

Acquiring new technology and/or upgrading existing technology is an expensive proposition. Organizations seeking data movement  recommendations should look for consulting firms that provide vendor neutral recommendations. These will tend to be objective and unbiased.

Data Integration Assessment Process

Should determine the technical readiness for information management and data warehousing by comparing the current situation with the following checklist of information management guidelines.

Technology
  • Should capture all business rules for data transformation in the ETL tool meta data repository;
  • Should provide the capability to develop code required to extract data form source systems, transform per business transformation rules and load it into appropriate data warehouse storage environments;
  • Should provide the capability to schedule data extraction from source systems and transform and load it into appropriate staging areas, data warehouses and data marts;
  • Should provide repositories to store data integration software and common routines;
  • Should provide the capability to perform impact analysis; and
  • Should consider common data integration tools such Ab Initio, PowerCenter, from Informatica, Ascential DataStage, from IBM, SQL Server Integration Services, from Microsoft, Business Objects Data Services, from SAP, Data Integration Studio, from SAS and Pentaho, or Open Studio for Data Integration, from Talend.
Source Code Management
  • Should manage all source code through a centralized repository with versioning, change tracking, and automated build processes;
  • Should include DDL for table and view construction, Unix scripts, autosys or Control M scripts, data integration and business intelligence code and documentation;
  • Should provide the capability to rollback if a significant defect caused by a recent code modification is discovered; and
  • Should consider common tools such Informatica PowerCenter Version Control, from Informatica, Serena Dimensions CM, from Serena or Microsoft Team Foundation Server, from MicroSoft
Configuration Management
  • Should provide the capability to catalog all infrastructure configuration items in a centralized repository;
  • Should provide access to development and production support teams;
  • Should maintain details about specific hardware models, documentation versions, software versions, patch levels, etc;
  • Should track acquisition and maintenance costs; and
  • Should consider common tools such ServiceNow Configuration Management, from ServiceNow.
Master Data Management
  • Should provide a consistent framework of tools across multiple subject areas;
  • Should provide the capability to synchronize a master data of record repository with transactional source systems;
  • Should provide the capability to feed a master data reference repository;
  • Should maintain the golden data for all updates;
  • Should maintain a history of changes; and
  • Should consider common master data management tools such as Collaborative Information Manager, from TIBCO, Informatica MDM Hub, from Informatica, InfoSphere MDM, from IBM, Master Data Services, from Microsoft, Master Data Maestro, from Profisee, qMDM, from Dataflux/SAS, Netweaver Master Data Management, from SAP or Teradata MDM, from Teradata.
Source to Target Mapping Tools
  • Should use source to target mapping templates such as Excel; and
  • Should consider software such as the lightweight, but very powerful and cost effective technology such as Qoofy, or more complex tools such as the AnalytiX Mapping Manager - A Data Mapping Tool.
Architecture
  • Should consistently meet expectations for service level expectations for performance, availability, data currency, and data quality;
  • Should consistently meet expectations for monitoring transactions through the system for the purposes of auditing, error diagnosis, and performance management;
  • Should consistently meet expectations for accuracy of processing, robustness (uptime) and consistency of performance, which is normally measured as “mean time between failures”;
  • Should consistently meet expectations for the amount of time, during normal use periods, that the system must be available;
  • Should consistently meet expectations for data accuracy;
  • Should consistently meet expectations for currency of data; and
  • Should consistently meet expectations for data accuracy.
Methodology
  • Should include an enterprise conceptual data flow diagram e.g an enterprise conceptual data flow diagram will typically show the flow of data coming into the organization, within the organization, and out of the organization;
  • Should include logical data flow diagrams as part of the requirements definition/design process e.g these diagrams are typically used to show all the data files/tables that comprise a source feed and are helpful to confirm that requirements are complete;
  • Should include source to target maps as input to the design phase e.g. these are typically created using spreadsheets and show the source table/columns mapped to target tables/columns with transformation and data quality rules documented as input to the design process.
Organization
  • Should have resources accountable for data integration architect roles and responsibilities;
  • Should have resources accountable for data integration developer roles and responsibilities;
  • Should have resources accountable for data integration technical support roles and responsibilities; and
  • Should have resources accountable for data integration software administrator roles and responsibilities.
Data Movement
  • Should support and track metadata, so that it can be reported upon and leveraged during on-going development work and data stewardship activities;
  • Should enforce a common approach towards developing the data movement modules, and enforce development standards;
  • Should use a minimum of physical steps to move and transform data, with a minimum of data movement tasks;
  • Should provide for an efficient Identification and processing of changed data e.g. change data capture is a challenging component of any data acquisition solution and there are tools to simplify this process, but even a custom solution can be very successful;
  • Should leverage technology and design principles to ensure scalability; and
  • Should support manual data e.g. user-defined hierarchy for reporting.
Summary...

Data Movement involves obtaining and storing data, and the architecture and processes required to support those activities. Data integration e.g. extract, transform and load, change data capture, and cleansing tools are commonly used in support of data integration.

This site introduced data movement assessment and provided a checklist of information management guidelines for each assessment task.