logo for information-management-architect.com
leftimage for information-management-architect.com

Data Mapping and Data Movement
Extract Transform and Load (ETL)

Need data mapping and data movement practical standards and best practices for data governance?

What is data movement and data mapping?Data Mapping

The data movement discipline focuses on the movement of data between systems.  In this context, “systems” include external data sources, operational systems, and analytic data stores.  Data movement encompasses the extract-transform-load (ETL) facilities used for bulk  data movement.  It also includes mechanisms to support continuous movement of discrete records, rows, or messages between systems.

What are mapping and data movement best practices?

Data movement ETL best practices provide a guide for the analysis, design, and development of data movement processes that are consistent, usable and of high quality.

Why have data movement best practices?

Common development best practices are important for the following reasons:
  • They lead to greater consistency, which subsequently leads to greater productivity;
  • They reduce ongoing maintenance costs; and
  • They improve readability of software, making it easier for developers to understand new code more quickly.
Data mapping and data movement best practice goals
  • Introduce common, consistent data movement analysis, design, and coding patterns;
  • Develop reusable, enterprise-wide analysis, design, and construction components through data movement modeling processes using data movement tools, to ensure an acceptable level of data quality per business specifications;
  • Introduce best practices and consistency in coding and naming standards;
  • Reduce costs to develop and maintain analysis, design and source code deliverables; and 
  • Integrate controls into the data movement process to ensure data quality and integrity
What are some data movement model best practices?
  • An enterprise conceptual data movement model should be created as part of the information management strategy. This model is part of the business model and shows what data flows into, within, and out of the organization.
This is a “high-level” model that shows movement of data from one application to another. Order entry systems may capture customer name, address information, and later send it to the application, which handles billing and accounting so that bills can be sent out and money collected

Think of the conceptual data model as an architect’s conceptual drawing of a house. It provides a good idea of what is required with very little additional detail.
  • A logical data movement model should show data movement requirements at the dataset (entity/table)-level. It should detail the transformation rules and target logical datasets (entity/tables). This model is still considered technology independent.
It should show each interface between systems. The order entry system, which sends customer name and address information to the billing application, may actually send three or four files. Each file is called an interface. Each interface should be shown in the logical data movement model.

The focus at the logical level is on the capture of actual source tables and proposed target stores.

The logical data movement model should be supported by a source to target mapping document, which should include business rules and data transformation rules.
  • The physical data movement model is a detailed representation of the data movement requirements at the dataset (table)-level that details the transformation rules and target physical datasets (tables).
This model is considered technology dependent. Best practice dictates that there may be one-to-many physical models for each logical model.

Data mapping and data movement model tools

Some of the tools have the ability to create data flow diagrams. Best practices suggest that these tools be used to create and document the data movement logical and physical models.

What data movement best practices are required?

The following topics should be covered by best practices:
  • Architecture and design;
  • Testing;
  • Configuration management;
  • Data movement software and technologies should standardize on a common data movement tool and ensure that this is included in the information management technology standards list. Some common tools include: 
    • Ab initio;
    • Ascential datastage; and
    • Informatica.
Some “open source” ETL tools should also be considered. 
  • Standardize on a job-scheduling tool such as:
    • Control-M for job scheduling; or
    • Autosys  
  • Accountabilities should be defined and communicated to all stakeholders for each key role in data movement.


Data movement standards and best practices checklist


Data movement standards and best practices should be established to address the following topics.

Standards for data flow diagram.
Best practices for data profiling including what the profile should contain, what tools should be used to create the profile and who should be involved with data profiling.
Performance standards for data warehouse experts e.g. contractors, should be established with clearly defined job descriptions based on information requirements.
Standards for data management services e.g. outsourcing data mapping and data movement development should be established if these services are part of the data management practice.
Standards for data feed meta data.
Best practices for generic data formats and meta data driven data movement routines should be considered.
Standard source to target data mapping templates.
ETL data mapping activities should be included as part of the project management methodology requirements analysis phase.
Best practices to handle missing data.
Best practices to standardize specialized data extractor routines should be considered.
Standards for data sets and data set meta data.
Best practices for data maintenance should be considered.
Standards for data transmission must be established including standards for signal files.
Best practices extract data from source systems.
Standards for data file documentation.
Best practices for ETL and data warehouse should be created.
Standard routines for data validation should be considered.
Best practices to prevent data loss should be established.
Common data cleansing routines should be developed.
Data movement extract transform load best practices should be developed.
A common architecture should be established to ensure complete data integration.
Best practices to ensure successful data migration between applications.
Standards for customer data and information integration should be considered.
Common transformation routines should be developed.
Performance standards and best practices should be considered for data warehouse loading.
Data warehouse performance requirements should be defined in the requirements analysis phase.
Standard data recovery procedures and routines should be developed.
Best practices to export data from data warehouses should be created.
Best practices for using "fast load" techniques e.g. data pump should be created.
Standards to ensure data integrity should be established.
Standard data restore routines should be established.
Best practices for data import into data marts should be established.
Standards and best practices for master data management should be created.
Standards for data management and enterprise application integration should be established.
Best practices for data transfer between applications should be established
Bets practices for enterprise information integration should be considered.
Best practices for data replication between applications should be established.
Business rules for data sharing between applications should be created and validated by data guardians and data stewards.
Technical data exchange  and data interchange standards should be established.

Summary...


Data management is a sub-set of information management that governs organization and control of the structure and design, storage, movement, security and quality of information.

Data mapping and data movement standards and best practices are required to ensure rapid project delivery and optimal return on information management investment.

footer for Information management page