logo for information-management-architect.com
leftimage for information-management-architect.com

Extract Transform Load

Need to manage extract transform load and want practical standards and best practices for business intelligence governance and accountability?

What is information management?Extract Transform Load

Information management is a corporate management process that governs accountability for the structure and design, storage, movement, security, quality, delivery and usage of information required for management and business intelligence purposes.

What is data movement?

The extract transform load data movement discipline focuses on the movement of data between systems.  In this context, “systems” include external data sources, operational systems, and analytic data stores. 

Data movement encompasses the extract transform load (ETL) facilities used for bulk  data movement.  It also includes mechanisms to support continuous movement of discrete records, rows, or messages between systems.

What are data movement best practices?

Data movement ETL best practices provide a guide for the analysis, design, and development of data movement processes that are consistent, usable and of high quality.

Why have data movement best practices?

Common development best practices are important for the following reasons:
  • They lead to greater consistency, which subsequently leads to greater productivity;
  • They reduce ongoing maintenance costs; and
  • They improve readability of software, making it easier for developers to understand new code more quickly.
Best practice goals
  • Introduce common, consistent data movement analysis, design, and coding patterns;
  • Develop reusable, enterprise-wide analysis, design, and construction components through data movement modeling processes using data movement tools, to ensure an acceptable level of data quality per business specifications;
  • Introduce best practices and consistency in coding and naming standards;
  • Reduce costs to develop and maintain analysis, design and source code deliverables; and 
  • Integrate controls into the data movement process to ensure data quality and integrity
What are some data movement model best practices?
  • An enterprise conceptual data movement model should be created as part of the information management strategy. This model is part of the business model and shows what data flows into, within, and out of the organization.
This is a “high-level” model that shows movement of data from one application to another. Order entry systems may capture customer name, address information, and later send it to the application, which handles billing and accounting so that bills can be sent out and money collected

Think of the conceptual data model as an architect’s conceptual drawing of a house. It provides a good idea of what is required with very little additional detail.
  • A logical data movement model should show data movement requirements at the dataset (entity/table)-level. It should detail the transformation rules and target logical datasets (entity/tables). This model is still considered technology independent.
It should show each interface between systems. The order entry system, which sends customer name and address information to the billing application, may actually send three or four files. Each file is called an interface. Each interface should be shown in the logical data movement model.

The focus at the logical level is on the capture of actual source tables and proposed target stores.

The logical data movement model should be supported by a source to target mapping document, which should include business rules and data transformation rules.
  • The physical data movement model is a detailed representation of the data movement requirements at the dataset (table)-level that details the transformation rules and target physical datasets (tables).
This model is considered technology dependent. Best practice dictates that there may be one-to-many physical models for each logical model.

Data movement model tools

Some of the extract transform load tools have the ability to create data flow diagrams. Best practices suggest that these tools be used to create and document the data movement logical and physical models.

What data movement best practices are required?

The following topics should be covered by best practices:
  • Architecture and design;
  • Testing;
  • Configuration management;
  • Data movement software and technologies should standardize on a common data movement tool and ensure that this is included in the information management technology standards list. Some common tools include: 
    • Ab initio;
    • Ascential datastage; and
    • Informatica.
Some “open source” ETL tools should also be considered. 
  • Standardize on a job-scheduling tool such as:
    • Control-M for job scheduling; or
    • Autosys  
  • Accountabilities should be defined and communicated to all stakeholders for each key role in data movement.

Data management is a sub-set of information management that governs organization and control of the structure and design, storage, movement, security and quality of information.

Enterprise business intelligence requires extract transform load data movement standards to ensure rapid project delivery and optimal return on information management investment.

footer for Information management page