Why should we manage metadata, which are the various facets of information? What are those facets? How are they managed? It sounds expensive, both in time and personnel. Is it only for really big companies? Questions abound from developers, project managers, and executives. Using a fictitious vehicle rental service, we will explore the key categories of metadata: Descriptive, structural, and administrative.
This article assumes that you understand data within your organization. You should have knowledge of valuable data to your company, also known as critical data elements. You should know many or most of these important data elements with great clarity and precision. You should have a general understanding of sources of data. The sources likely have multiple date references, file types, and role restrictions. Most modern systems use externally sourced data as widely different as "construction starts" to "hourly weather in a few zip codes." With this background, a technical or business analyst works with metadata daily or weekly, and for many reasons. Descriptive, structural, and administrative metadata inform various views of organizational health.
What Is Metadata?
In the simplest terms, metadata describes the various facets of information. Contextually, it is not the data within transaction systems. Instead of "2018 Toyota Camry to Andrew Wood on January 22, 2019", metadata measures the counts, roles, datatypes, and other non-data profiles and hardware component usage. The key indicators for metadata indicate descriptive, structural, and administrative information.
Descriptive metadata in the fictitious vehicle rental service includes keywords and search identifiers for vehicle type, customer classification, dates, locations, and service suppliers. From the previous example, descriptive metadata is focused on the sedan classification and supply chain from acquisition through disposition. Descriptive metadata for a transaction is different than descriptive metadata for individual activities within the transaction. For a transaction, critical descriptive metadata elements are location, vehicle class, customer status, and dates. Within the transaction, descriptive metadata include vehicle turnaround, point-in-time status, and customer demographics.
For a transaction, metadata on the number of each classification of vehicle is valuable information to an analyst reviewing a region or specific location. Another business analyst is seeking a facet of descriptive information on the financing resources, whether a traditional commercial bank or a vehicle financing firm. Financial analysts are focused on descriptive metadata, rather than transaction data, when they seek answers to which terms were most favorable for a vehicle classification, or location, or customer class, or funding source, or combination of these.
Structural metadata includes technical information about a digital object, such as file format, size, media, and source details. In the vehicle rental service, several systems can have identifiers that allow integration of data. In these cases, the structural metadata provides a profile of the systems. This structural information includes the ongoing count of the files, folders, network traffic I/O rates, and sizes in hot and cold storage. Structural metadata is most closely associated with the technical environment in which a system operates.
This metadata is valued for determining the overall data growth rate. For the vehicle rental system, structural metadata is valuable when discussing the performance of systems related to promotions and planning with service providers for increased resource needs during campaigns. Some analysts will focus on the number of additional terabytes of storage or processing, and other analysts will focus on the percentage increase in resources to maintain a specific performance agreement. In both cases, structural metadata will be used by the analysts to determine the technical resource baseline that supports the health of the organization.
Administrative metadata includes the lineage of the data, the kinds and number of transformations, and counts of inaccurate data from profiling sessions. Processes create metadata that identifies operational aspects of data ingestion from multiple sources. Process counts, sources, volume ingested, transformation types, run dates, and rejected rows for specific reasons are observed and tracked. Data in production systems includes very slowly changing data, such as the vehicle classifications, to slowly changing information such as manufacturer, make, and model, to rapidly changing information such as the details for each vehicle rental.
In a vehicle rental service, the lineage of the 2018 Toyota Camry starts with the vehicle identification number applied at manufacturing. The vehicle might be sourced in one of several ways for the fictious vehicle rental service. Perhaps individually-owned vehicles are offered for rent. Manufacturers and their dealer networks are sources. Corporations lease a dozen or 1,000 vehicles in a single agreement. Analysts determine whether production processes are impacting organizational health.
Why Do We Manage Metadata?
Metadata management is performed to ensure the business analysts and stewards have identified the critical data elements by which an organization is measured, both internally and externally. These critical data elements are described individually in a business glossary. This can be as simple as an Excel workbook or Google Sheets, or as advanced as an automated repository created by software within a data flow step. Narrow-scoped cloud-based metadata management may be most useful for an enterprise with many locations, users, roles, and systems. Managing the critical data elements in a collection called the business glossary is a sign of a maturing data culture in an organization.
Business glossaries are typically maintained by a group of employees whose major focus is on a departmental aspect of an organization. These employees take on a data stewardship role in addition to their major responsibilities. These data stewards review the critical data elements and terminology that is used to refer to these data elements. Terms that are referenced in one department must be understood by another department for operations to flow smoothly. In the rental service example, consider the customer. Is Andrew Wood renting a vehicle as a sole individual, or is the transaction related to delivery of a company vehicle to Mr. Wood, the employee? What does the metadata inform when we focus on customer status, vehicle type, and aggregated rates?
Metadata reveals much about many functions within an organization. Metadata forms the basis to inventory, describe, and understand an organization's data from multiple perspectives. The completeness of identification and granularity of metadata on the critical data elements is valuable. Abbreviations, terminology, rules for business and quality processing, and user role documentation are starting points to building a glossary. Modern tools and services include management features for several of the common metadata indicators.
How Is Metadata Managed?
Metadata is managed poorly, too often. It is not managed at all, in many cases. In maturing data management operations, technical and business analysts process data with a focus on measuring key performance indicators within descriptive, structural, and administrative metadata.
Here are common metadata performance indicators. Few companies fund resources for all of these, and some are funded or managed as elements of other data management initiatives.
API Metadata - Data sourced in systems for publishing, versioning, monitoring and securing APIs at appropriate scale
Impact Analysis - Identifies the potential consequences of a change, or estimating what needs to be modified to accomplish a change
Business Glossary - Communicates and governs the organization’s business concepts, terminology, definitions and relationships
Data Dictionary - Centralized repository of information about data, including origin, usage, format, and meaning
Data Lineage - Describes data lifecycle from source through processes and transformations
Data Mapping - Finds transformations between two data sets, discovering substrings, concatenations, arithmetic, case statements, other transformation logic, and semantic data element synonyms
Data System Metadata - Inventories each physical element within an ecosystem to the level of specific manufacturer and model
Data Usage Statistics - Details how many resources are needed by application
Document/File Metadata - Describes physical files, including type, format, size, dates, users, and source-specific data
Relational Metadata - Inventory of databases, entities, attributes, data types, row counts, size on storage, domains, owner
Searchable Index - Describes terms and volume of interest in terms of a context
Social Context - Setting in which people create and interact to topics
Seldom do technologists and business analysts manage more than a few of these measurable processes. Executives at a strategic level will want to be familiar with the terminology and should review results every month or twice per year to assess maturity of the metadata management function. Descriptive, structural, and administrative information are factual indicators of the health of an organization.