|
Data Mapping and Data Movement
Extract Transform and Load (ETL)
Need data mapping and data
movement practical standards and best practices for data
governance?
What
is data movement and data mapping?
The data movement discipline focuses
on the movement of data between
systems. In this context, “systems” include external data
sources, operational systems, and analytic data stores. Data
movement encompasses the extract-transform-load (ETL) facilities used
for bulk data movement. It also includes mechanisms
to support
continuous movement of discrete records, rows, or messages between
systems.
What are
mapping and data
movement best practices?
Data movement ETL best practices
provide a guide for the analysis, design,
and
development of data movement processes that are consistent, usable and
of high quality.
Why have data movement best practices?
Common development best practices are
important for the following
reasons:
- They lead to greater consistency,
which subsequently leads to greater productivity;
- They reduce ongoing maintenance
costs; and
- They improve readability of software, making it
easier
for developers to understand new code more quickly.
Data mapping and data movement best
practice goals
- Introduce common, consistent
data
movement analysis, design, and coding patterns;
- Develop
reusable,
enterprise-wide analysis, design, and construction components
through data movement modeling processes using data movement tools, to
ensure an acceptable level of data quality per business specifications;
- Introduce best practices and consistency
in
coding and naming standards;
- Reduce costs to
develop and maintain analysis,
design and source code deliverables; and
- Integrate controls
into the data movement
process to ensure data quality
and integrity
What are
some
data movement model best practices?
- An
enterprise conceptual data movement model
should be created as part of
the information management strategy. This model is part of the business
model and shows what data flows into, within, and out of the
organization.
This
is a “high-level” model that shows movement
of data from one application to another. Order entry systems may
capture
customer name, address information, and later send it to the
application, which handles billing and accounting so that bills can be
sent out and
money collected
Think of the conceptual data model as
an
architect’s conceptual drawing of a house. It provides a good idea of
what is required with very little additional detail.
- A
logical data movement model should show data movement
requirements at
the dataset (entity/table)-level. It should detail the transformation
rules and target logical datasets (entity/tables). This model is still
considered technology independent.
It should
show each
interface between systems. The order entry system, which sends customer
name and address information to the billing application, may actually
send three or four files. Each file is called an interface. Each
interface should
be shown in the logical data movement model.
The focus at the logical level is on
the capture of actual source
tables and proposed target stores.
The
logical data movement model should be supported by a source to
target
mapping document, which should include business rules and
data
transformation rules.
- The
physical data movement model is a detailed representation
of the data
movement requirements at the dataset (table)-level that details the
transformation rules and target physical datasets (tables).
This
model is considered technology dependent. Best practice dictates
that there may be one-to-many physical models for each logical model.
Data mapping and data movement model
tools
Some
of the tools have the ability to create data flow
diagrams. Best
practices suggest that these tools be used to create and
document
the data movement logical and physical models.
What data movement best practices are
required?
The following
topics should be covered by
best practices:
- Architecture
and design;
- Testing;
- Configuration management;
- Data movement software
and technologies should standardize
on a common data movement tool and ensure that this is included in the
information management technology standards list. Some
common tools include:
- Ab initio;
- Ascential datastage; and
- Informatica.
Some “open
source” ETL tools should also be considered.
- Standardize
on a job-scheduling
tool such as:
- Control-M for job scheduling; or
- Autosys
- Accountabilities
should be defined
and communicated to all stakeholders for each
key role in data
movement.
Data movement standards and best
practices checklist
Data movement standards and best practices should be established to
address the following topics.
Standards
for data flow diagram.
Best
practices for data profiling including what the profile should contain,
what tools should be used to create the profile and who should be
involved with data profiling.
Performance
standards for data warehouse experts e.g. contractors, should be
established with clearly defined job descriptions based on information
requirements.
Standards for data
management services e.g. outsourcing data mapping and
data movement development should be established if these
services
are part of the data management practice.
Standards for data
feed meta data.
Best practices for generic data
formats and meta data driven data movement routines should be
considered.
Standard
source to target data mapping templates.
ETL
data mapping activities should be included as part of
the project management methodology requirements analysis phase.
Best practices to handle missing data.
Best practices to
standardize specialized data extractor
routines should be considered.
Standards for data
sets and data set meta data.
Best practices for data
maintenance should be considered.
Standards for data
transmission must be established including standards for signal files.
Best practices extract data from source
systems.
Standards for data
file documentation.
Best practices for ETL and
data warehouse should be created.
Standard routines for data
validation should be considered.
Best practices to prevent data
loss should be established.
Common data
cleansing routines should be developed.
Data movement extract
transform load best practices should be developed.
A common architecture should
be established to ensure complete data
integration.
Best practices to ensure
successful data migration between
applications.
Standards for customer
data and information integration should be considered.
Common transformation
routines should be developed.
Performance standards and
best practices should be considered for data
warehouse loading.
Data
warehouse performance requirements should be defined in the
requirements analysis phase.
Standard data
recovery procedures and routines should be developed.
Best practices to export
data from data warehouses should be created.
Best practices for using
"fast load" techniques e.g. data pump should
be created.
Standards to ensure data
integrity should be established.
Standard data
restore routines should be established.
Best practices for data
import into data marts should be established.
Standards and best practices
for master data management should be created.
Standards for data
management and enterprise application integration should be established.
Best practices for data
transfer between applications should be established
Bets practices for enterprise
information integration should be considered.
Best practices for data
replication between applications should be established.
Business rules for data
sharing between applications should be created and validated by data
guardians and data stewards.
Technical data
exchange and data interchange standards should be established.
Summary...
Data management is a sub-set of
information management that governs
organization and control of the structure and design, storage,
movement, security and quality of information.
Data mapping and data movement
standards and best
practices are required to ensure rapid project delivery and optimal
return on information management investment.
|