Metadata Management

The vision of any business should include the desire to improve the company’s understanding of, and ability to use, its data by the disciplined management of Business, Technical and Operational Metadata.

Metadata Management

The effective management of metadata is one of the essential activities of a data steward within a governance practice. Metadata management refers to the activities associated with ensuring that metadata is created/captured at the point of the data’s creation and that the broadest possible portfolio of meta-information is collected, stored in a repository for use by multiple applications, and controlled to remove inconsistencies and redundancies. By ensuring that metadata is properly created, stored and controlled, then any company can demonstrate the data quality and lineage of the data it relies upon to make sound business decisions and in the regulatory reporting to governments and its shareholders.

Data is an Asset

Without formal accountabilities and processes which build upon that understanding and define how the new data is introduced within the organization, data is not being treated as a valuable asset within that organization.

  • How quality is measured.
  • How privacy guidelines are applied.
  • How data can be used and who is able to use it.

For a successful implementation of good data governance and metadata management it is important that the following is true.

  • There is commitment from senior leaders that data is strategic to the organization’s long-term success.
  • There is a knowledge repository which both business people and technology people utilize to understand the current state of their data.
  • The organization treats data and the information describing it (metadata) as an asset every bit as valuable as the goods or services it produces.

The Business Case

Data governance is the system of establishing controls, decision rights, responsibilities, and accountabilities for data and information-related processes, which ensure the effective and efficient use of an organization’s information assets for achieving its corporate goals.

Effective governance of information assets demands setting up formal governance bodies and processes and establishing control of the organization’s knowledge about those assets. Metadata is the principle means by which this knowledge is represented. The effective management of metadata is one of the essential activities of a data steward within a governance practice.

Consequences of No or Poor Metadata Management

Here are some of the major consequences of a lack of metadata management.

  • The absence of technical metadata in a data store would require use of the information without consideration of the source system(s) that provided it.
  • In companies with decentralized systems or in companies with poor data administration practices, it is "normal" to define data as it pertains to a particular business function and not the enterprise as a whole; this results in the creation of multiple (and often different) definitions of the same data for a single company.
  • Mergers and acquisitions result in multiple companies with multiple definitions of data joining together as a single enterprise therefore compounding the problem even further.
  • An inability to manage the data definition can result in an extremely large amount of time being spent performing unproductive work. It is not out of the ordinary for a data analyst to spend a disproportionate amount of their time identifying and researching data leaving less of their time to performing analysis, which delivers the real value to the organization.

There are typically two places in the organization where the business definition of data exists.

  • In people's heads - Unwritten rules for defining, requesting, entering, processing and making decisions from data exist at every point in the company where data are touched. Companies that survive on unwritten rules and definitions of data make themselves vulnerable for low quality data and data misuse as a result of the lack of consistency and the lack of confidence in the data.
  • In data models - Companies that are inconsistent in how they manage their data models or companies that use data modeling tools only on occasion can suffer from the same problems as those with decentralized IT, poorly managed data administration and merged business units. Multiple data modeling tools can also propagate inconsistent data definition if the information captured in the tools is not shared between data modellers or delivered to the data users.

Benefits of Good Metadata Management

The benefits of data stewardship for metadata can add up to an increase in data integrity benefiting both the technical and business users by increasing the level of confidence in the quality of the information content in a data store.

  • The creation and documentation of the data definitions for the subject area’s entities and attributes.
  • Identifying the business and architectural relationships between objects.
  • Certifying the accuracy, completeness and timeliness of the content.
  • Establishing and documenting the context of the content (data heritage and lineage), such that information originating from one or more sources can be easily and quickly determined through the technical metadata on that row.
  • The benefits of technical metadata use include source system identification, data quality measurement, improved management of ETL processes and database administration.

Priorities for Implementing Metadata Management

To be successful, management must first properly size and understand the effort that will be required to manage metadata. As the first step towards sizing the effort, a company must select a finite amount of metadata to be managed.

The source of metadata can be found in one of six areas.

  • Data administration, for example, Technical Data Management.
  • Database internal catalogs/data dictionaries and technical metadata on the rows in a table.
  • Data movement, such as ETL processes.
  • Business intelligence, such as data warehouses.
  • Data virtualization, such as, Web Services and Map Services.
  • Business unit policies, taxonomies, functions or processes.

The successful implementation requires thought be given to the capture and storage of metadata.

  • Storing metadata in a common repository enhances its usability.
  • Physical centralization is not always required for metadata management. IT governance, a companion to data governance, should determine how this logical organization of metadata is implemented. For instance, some metadata may be stored in the operational data stores or data warehouse at the time of loading and read out of there into a metadata repository.
  • A clear, consistent method of tagging the data originating from the operational systems needs to be developed and agreed upon by both the technical and business users of the data warehouse. Any technical metadata tied to the row must be applicable to the entire row of data, not just the majority of columns in the table. This is very important when you are virtualizing data.

Metadata Management Model

Why Implement a Metadata Management Strategy?

Metadata is used to represent the knowledge captured about the information of interest to the organization as well as the environments through which it is governed, controlled, captured, persisted and distributed. Some of the benefits to an organization adopting a formal, structured metadata management strategy are as follows.

  • Common language between Business and IT.
  • The opportunity to improve data quality through greater understanding of enterprise data.
  • Reduced redundancy of metadata.
  • Reduced reconciliation efforts around data definition.
  • Alleviate loss of knowledge when staff transfers, retires, or leaves the company.
  • Minimize the effort on learning new data sources from data vendors or new systems of record.
  • Reduced development cycle times for new and existing systems.
  • More relevant results from enterprise search engines.

Metadata Promotes a Collaborative Information Management Environment

A “Common Information Model” of metadata provides capabilities that promote collaboration between groups.

  • Visibility to information management policies and guidelines as captured in the business rules.
  • A view to the business in terms of the business processes that they perform and the information resources they use to perform them.
  • A view of the processing systems that IT support, plus the logical data structures and their physical implementations.
  • Valuable information needed for data governance.

Metadata Repository Capabilities

Any metadata repository must support the maintenance of business, technical and operational metadata. At a high level this means it must capture the following information.

  • Business dictionary.
  • Business taxonomy.
  • Business rules.
  • Data lineage.
  • Data quality.
  • Any other mandatory items in ISO-19115 not covered by any of the above.
  • Additional items required for data governance as it relates to metadata.

Technical Metadata Attributes

As mentioned earlier, it is not only the metadata repository where metadata is kept but also within the operational or warehouse data stores. Such technical metadata that is maintained outside the metadata repository could include the following.

  • Load Date which indicates when, date and/or time, a row of information was loaded.
  • Update Date which denotes when a row was last updated.
  • Load Cycle Identifier which is a sequential identifier assigned during each load cycle to the data store regardless of the refresh frequency.
  • Current Flag Indicator which identifies the latest version of a row in a table.
  • Operational System Identifier which is used to track the originating source(s) of a data row.
  • Confidence Level Indicator which is used to indicate how business rules or assumptions were applied during the ETL processes for a particular row of data through application of a ranking value. This tag provides a measure to a user as to the credibility level of a data row based on the transformation processing performed. It is used to identify potential problems with data quality from source systems and to facilitate correcting these issues.