Quantcast
Channel: EDW Strategic Positions » Architecture
Viewing all articles
Browse latest Browse all 11

EDW Reference Architecture: Metadata

$
0
0

EDW Reference Architecture: Metadata

As a final note on the series EDW Reference Architecture, introduced in Why Bother? and continued in discussions on the Acquisition, Integration, Warehousing and Provisioning layers, this article discusses an approach to metadata.

Metadata

Metadata is information about information, and can encompass anything pertaining to its definition, structure, content, integrity, lineage and processes.

Metadata Types

Ralph Kimball, a data management leader, usefully distinguishes between types of metadata:

Technical

This is information about the structures and content of the data, including data types, profiling results that reveal details of data quality and integrity, and other structural information that will have meaning primarily to those involved in development, but also to those who will maintain the system.

Business

This is the definition of the data as it pertains to the business, as well as information about the source of the data, its timeliness, and may link to where the data is being used in reports. These are aspects of the data that are of interest to business users.

Process

This is information about the processes through which the data has passed. This includes relationships to the multiple ETL processes that have migrated the data from source to target, but may also refer to the project responsible for bringing it into the EDW, processes and applications that extract it, and usage statistics related to operational and analytical applications. This will be of interest mostly to those maintaining the EDW.

Technical metadata on sources can be collected quite easily, and can be useful for downstream processing, audit and quality checks; but it has very limited (if any) use to the business, and taken in isolation contributes little to understanding the end-to-end system. Likewise, business definitions held in a glossary are useful to business, but without links to sources, targets and aliases throughout the system have limited value to the development process. Each organization must determine where and how it will derive value from metadata, but in most cases, optimal value will come from a cross-section of all three types, with a consolidated view of the technical, business and process-related aspects of the data.

Metadata Challenges

Some of the challenges related to metadata include:

  • Lack of connection between work processes and metadata collection
  • Inability to generate impact analysis and data lineage reports
  • Inability to connect disparate sets of metadata, where metadata collection does occur
  • Lack of leverage of metadata for support of development processes
  • Lack of comprehensive solution to collect, manage and consume metadata

A frequent complaint made of metadata is that it may be easy to collect, but it’s difficult to keep up to date. Similarly, the maintenance of metadata is often an extra job, that sits outside of the team developing and maintaining the EDW. Even when it is gathered and managed well, the metadata tends to reside in silos – with profiling, ETL, and business definitions residing separately and never cross-referenced.

To make it more cost-effective, metadata must be part of EDW change management; to keep it in good order, the responsibility to collect and maintain metadata must belong to the development team; and to extract the most value from it, the metadata must be integrated.

Metadata Uses

In order to establish a set of criteria by which to judge the value of a given metadata strategy, it is important to first identify the purpose of such an initiative. To be of use, metadata must be part of a Practical Governance Framework:

  • Governance is the foundation of the Development Activities
  • Development Activities populate the Metadata Repository
  • Metadata Repository makes possible Governance Activities
  • Governance Activities are targeted to meet Project Objectives
  • Business Objectives are designed to follow Business Drivers
  • Business Drivers show EDW benefit to the organization

Using this framework, the only metadata collected will be that which serves the overall objectives of improving:

  • Efficiency of the development process
  • Accuracy of the information being processed
  • Productivity of business users

A successful metadata strategy must be geared towards enabling capabilities. It will involve focusing on the type of metadata required, recognizing and addressing the challenges of collecting and maintaining that metadata and incorporating the use of the metadata in the development process.

Here is a list of potential requirements that metadata can fulfill:

  • Data Quality Assessment:
    • Are the data elements adequately and accurately defined?
    • Of the required data elements, how many show integrity issues through profiling? (Do we have to manually cross-reference them?)
      • Do duplicate records exist? (Analysis of distinct values through profiling)
      • Are there special characters to contend with?
      • Is the data populated to an acceptable level? (Analysis of null values through profiling)
      • Are the values within expected ranges? (Analysis of min and max values and frequency distributions through profiling)
  • Project/Release management:
    • How much of the requirement has been completed? At what stage is it now?
    • How much overlap in data is happening between projects?
  • Data Lineage:
    • How is the data mapped from source through to information provisioning?
    • Where does the data on the reports ultimately come from?
  • Impact Analysis:
    • If a data element is removed from scope, what reports are affected?
    • If a report is removed from scope what data elements aren’t needed any longer?
  • Model Validation:
    • Will the data model accomodate the source data? (Cross-reference of target physical model with min and max lengths through profiling)
    • Does the data model include all required source data elements?

There are many other valuable applications of metadata. It is important to identify each application, along with the type of metadata, the process that generates it, the mechanism that can be incorporated into the process to collect it and the way it can be employed to exploit its value.

What is your organization’s approach to metadata? What were your challenges? What capabilities did it give you?



Viewing all articles
Browse latest Browse all 11

Trending Articles