|
Data Warehouse Principles
Need to understand data warehouse
principles for an information management project?
Data Warehousing is a sub-set of data
management, which in turn is a sub-set of
information
management that governs
organization and control of the structure and design, storage,
movement, security and quality of information.
What
Is Data
Storage?
Data storage is a set of principles,
standards, best practices
processes,
procedures and metadata required to ensure effective and efficient data
storage of information needed for business intelligence solutions.
Typical data
warehouse principles
- Data
is an enterprise asset and has
too often been viewed as belonging to particular individuals or as
simply part of an application. It is important to note
that
while data is a shared asset, the IT department should have
organizational responsibility for managing the technology
infrastructure that supports this asset.
This
principle implies that the enterprise
data model needs to be
effectively shared
and far greater rigor is required in integrating,
managing, and cataloging data.
The implications are far
reaching, but include improvements in data quality, greater use of
metadata, careful schema version management, and support for an
enterprise data warehouse;
- The business is the guardian for
data. Data assets should have owners in the
business. These owners are known as data
guardians and data
stewards.
This
implies significant
business involvement in developing business
definitions for entities and attributes. Data stewards should
have an active role in defining data quality specifications such as
valid values, required relationships, etc.
All of
this
metadata should be stored in a centrally managed “metadata repository”
– an on line data dictionary, that describes the corporate data asset
and the rules that protect it.
Data
stewards should also have a role in determining the
appropriate
security levels for the data assets;
- Data should be secured based on risk
analysis and the appropriate
level of security should
be implemented for each data
element within the enterprise. A risk-based cost analysis
should
drive all security decisions.
Stakeholders
should balance the
cost
of strict security measures and the potential for blocking
legitimate accesses against the potential risks presented by having
less stringent policies in place.
This implies that the IT
department place sufficient security infrastructure around data assets,
and that data guardians make well reasoned choices regarding legitimate
access needs versus the cost of inadequately secured data;
- Data should be stored in fewer
databases since it is better
to maintain a few large
multi-subject-area databases, rather than many application-specific
databases;
- Limit dependence on physical data
structures and ensure that business logic is
insulated
from the details of database structures; and
- Single point for data manipulation implies there
should be a single application, function library, or component that
manages all manipulation of data that is stored in systems of record
(SOR’s).
This
principle recognizes that data
quality should begin at the source
system where data is created, updated and deleted.
Typical
software
components include:
- Relational data base management
software (RDBMS),
which stores the data within the respective
environments;
- Extract transform and load software
(ETL), which
moves the data between environments; and
- Business intelligence
reporting/analytical tools,
which provide reporting and decision
support capabilities to end-users
Summary...
Data warehouse principles,
standards, best practices processes,
procedures and metadata are required to ensure effective, and efficient data
storage of information needed for business intelligence solutions.
Standards and best practices are
required to ensure rapid project delivery and optimal return on
information management investment.
|