|
Extract Transform Load
ETL
Need
to manage extract transform load and want practical standards and best
practices for business intelligence governance and accountability?
What
is information management?
Information
management is a corporate management process that governs
accountability for the structure and design, storage, movement,
security, quality, delivery and usage of information required for
management and business intelligence purposes.
What
is data movement?
The extract transform load data movement discipline focuses on the movement of data between
systems. In this context, “systems” include external data
sources, operational systems, and analytic data stores.
Data
movement encompasses the extract transform load (ETL) facilities used
for bulk data movement. It also includes mechanisms
to support
continuous movement of discrete records, rows, or messages between
systems.
What are data
movement best practices?
Data movement ETL best practices
provide a guide for the analysis, design,
and
development of data movement processes that are consistent, usable and
of high quality.
Why have data movement best practices?
Common development best practices are
important for the following
reasons:
- They lead to greater consistency,
which subsequently leads to greater productivity;
- They reduce ongoing maintenance
costs; and
- They improve readability of software, making it
easier
for developers to understand new code more quickly.
Best practice goals
- Introduce common, consistent
data
movement analysis, design, and coding patterns;
- Develop
reusable,
enterprise-wide analysis, design, and construction components
through data movement modeling processes using data movement tools, to
ensure an acceptable level of data quality per business specifications;
- Introduce best practices and consistency
in
coding and naming standards;
- Reduce costs to
develop and maintain analysis,
design and source code deliverables; and
- Integrate controls
into the data movement
process to ensure data quality
and integrity
What are some
data movement model best practices?
- An
enterprise conceptual data movement model
should be created as part of
the information management strategy. This model is part of the business
model and shows what data flows into, within, and out of the
organization.
This
is a “high-level” model that shows movement
of data from one application to another. Order entry systems may
capture
customer name, address information, and later send it to the
application, which handles billing and accounting so that bills can be
sent out and
money collected
Think of the conceptual data model as
an
architect’s conceptual drawing of a house. It provides a good idea of
what is required with very little additional detail.
- A
logical data movement model should show data movement
requirements at
the dataset (entity/table)-level. It should detail the transformation
rules and target logical datasets (entity/tables). This model is still
considered technology independent.
It should
show each
interface between systems. The order entry system, which sends customer
name and address information to the billing application, may actually
send three or four files. Each file is called an interface. Each
interface should
be shown in the logical data movement model.
The focus at the logical level is on
the capture of actual source
tables and proposed target stores.
The
logical data movement model should be supported by a source to
target
mapping document, which should include business rules and
data
transformation rules.
- The
physical data movement model is a detailed representation
of the data
movement requirements at the dataset (table)-level that details the
transformation rules and target physical datasets (tables).
This
model is considered technology dependent. Best practice dictates
that there may be one-to-many physical models for each logical model.
Data movement model tools
Some
of the extract transform load tools have the ability to create data flow
diagrams. Best
practices suggest that these tools be used to create and
document
the data movement logical and physical models.
What data movement best practices are
required?
The following
topics should be covered by
best practices:
- Testing;
- Configuration management;
- Data movement software
and technologies should standardize
on a common data movement tool and ensure that this is included in the
information management technology standards list. Some
common tools include:
- Ab initio;
- Ascential datastage; and
- Informatica.
Some “open
source” ETL tools should also be considered.
- Standardize
on a job-scheduling
tool such as:
- Control-M for job scheduling; or
- Autosys
- Accountabilities
should be defined
and communicated to all stakeholders for each
key role in data
movement.
Summary...
Data management is a sub-set of
information management that governs
organization and control of the structure and design, storage,
movement, security and quality of information.
Enterprise business intelligence requires
extract transform load data movement standards to ensure rapid project delivery and optimal
return on information management investment.
|