Data Governance: Learning Data Lessons

In the macro viewpoint, data governance is data governance, regardless. The micro view however should celebrate the nuances of data governance. One of those nuances is Data Governance for Education, which includes some unique challenges that are not always obvious.

Whether the notepaper being used is kindergarten blue or college ruled, the approaches to managing data in academia have been hampered by lack of funds and resources. The end result is that although educational environments could definitely benefit from the knowledge that could be found in the data they store, they are frequently prevented from effectively learning important data lessons. Challenging distinctions for education include the storage of massive amounts of unstructured and/or historical data that has not been computerized, the necessity of a robust privacy approach to protect student data, maintenance of the syllabi and course materials, faculty information that must be maintained to ensure quality instructors, certification and re-certification preparation information and human resources data, just to name a small sample.

Despite the complexities, it is easy to understand that education environments receive overwhelming amounts of data that must become data assets. The question is, “what is the best approach to bring universal meaning and order to that data?”

The Voice of Authority

Just as sports teams need a coach to help the team excel, data governance initiatives need authoritative sponsorship. The data governance problem as a whole would be overwhelming and the effort would likely never move beyond the outline stage without a sponsor to facilitate, mitigate and provide appropriate resources. Once a sponsor is identified, the scope of the Data Governance initiative can be determined and a plan of action can be designed.

Defining the Problem

Before the problem can be solved, it needs to be understood. An example of a problem statement might be, “Data is being stored in a manner that precludes its usefulness both for day-to-day operations and planning.” From there, the problem statement can be further defined:

  • There is no road map for defining the business terms, definitions, appropriate data uses or metadata.
  • There is no universal understanding of the chaining of data assets.
  • There are no roles and responsibilities directly tied to data quality, data protection or data governance.

Working the Problem

With the problem statement defined, the data governance focus should begin with the plan to transform the “As Is” to the “To Be”.

Breaking the process into logical sections, incrementally moving forward, establishing checkpoints, re-assessing when necessary and publishing successes to appropriate officials will help ensure that the work effort remains on track.

Strength from Weakness

Silos of data are typically one of the weaknesses identified during the majority of Data Governance initiatives. While silos lead to data inconsistencies and often prevent effective communication those same silos can become a strength during an initial Data Governance effort. Since most education environments have developed logical data silos (departments, schools, etc.) simply by the nature of their mission, one approach might be to use those silos to initially subset the task into manageable modules.

Silos, Experts and Objectives

A Data Governance team is now needed. It should be composed of members who are familiar with the full spectrum of academic operations and data flows. These are not typically the information technology professionals, but instead they are the ‘business’ experts. Depending on the type and size of the educational environment, these members could include employees of the administrative branches, academic advisors or possibly representatives of the offices of the Dean or Provost. These should be the individuals who understand the data, how it is sourced and how it is used. Employees who are familiar with problems or issues due to lack of data quality can be instrumental in driving positive changes, so including those individuals in the team is optimal.

This stage begins with some detective work and may uncover some unexpected outcomes. The silos that were defined in the first discussions may not be those that are listed in the final meeting. That is a benefit of bringing this team together since the initial assumptions may not match the reality.

What are the Data Assets?

Now that the silos have been identified, the composition of the Data Governance team may need to change. The members now need to begin the detailed discovery phase of the initiative. The goal for the Discovery Phase is to discover business and technical subject matter experts who can help identify and explain data that is important to their particular silo.

Identification should include all facets of the data. In addition to obvious sources of data (such as databases) data assets should be considered in pseudo-databases which may include information stored in spreadsheets, hard copy and even small local databases. If accurate metadata is available, it will be valuable during this phase. Code reviews may be helpful to determine usage patterns. Reporting and statistical analysis requirements should also be reviewed since they may point to data assets that are critical to ongoing operations.

What is the Source of Those Assets?

After identifying the data assets, it becomes important to know the source of those assets. How is the data obtained? Is it entered via an online application, uploaded to a database from a source list, or generated by a batch process?

An ancillary question that can provide value toward further data governance steps is how often the data is updated and/or accessed. Knowing whether data assets are historical in nature or used frequently can help with later prioritization considerations.

Take a Check Point

After the data assets and their sources of data have been identified, a compilation of the gathered information is needed. For each silo, a list of the data assets, the sources of those assets and their definitions should be prepared. An overarching analysis of these lists will likely indicate that some data assets are used by more than one silo. This is possibly the first opportunity for the Data Governance team to see realizable opportunities for improvements for data issues such as duplication, ambiguity, incompleteness or other data concerns.

How are Those Assets Currently Being Used?

With the overarching list of data assets, a determination of how those assets are being used can begin. The team will undoubtedly confront some obstacles in this phase, but without this information, the foundation of the Data Governance program will be unstable and each successive step may cause re-work.

Definition scenarios can be especially challenging to decipher. Consider data assets that are ‘named’ differently but which represent the same meaning, both within silos and spanning different silos. For example, what do the terms ‘admission’ and ‘enrollment’ mean? Do they mean the same thing, but are just known by different data naming conventions? If so, imagine the confusion of the new employee who sees these terms as presenting two different concepts because they are named differently. Perhaps these two terms do mean different things, but each department ascribes their own individually distinct meaning and there is no single holistic definition for each term. Consider too that historically, these terms may have had completely different meanings than they do now and perhaps some of these older meanings are still housed in currently used information systems.

How Should Those Assets be Used?

Now that the current use of the data is understood, it is time to determine how the data ‘should’ be used. Often the ‘currently used’ answer is different than the ‘should be used’ answer. Duplication of data, data inconsistencies, and misleading data definitions are all prime candidates for review.

Predominant data quality standards and validations will logically begin to be defined during this step. At this point, the team may want to consider beginning the effort to build an initial ‘business glossary’ which can provide meaning and definition to the data asset terms and set standards to facilitate clear communication for the rest of the Data Governance process.

Who is the Owner?

One of the most critical parts of any data governance approach is the identification of data stewards. Data stewards are considered the ‘owner’ of the data asset. They hold responsibility for ensuring that the data within their purview meets quality standards, answers a business need and that it is appropriately available to authorized users. Data stewards are the champions of the data. They are the ultimate layer of quality control. Typically their job function will depend upon the data that they own and therefore, they will have a vested interest in maintaining it properly.

Applying a Data Governance Maturity Approach

All the data lessons learned have been leading up to this point. This is when the true Data Governance maturity phase can begin. Depending on the outcome of the previous investigations, a decision can be made about the nature of the appropriate Data Governance model and approach. With the approach defined, now is the time to begin the synchronization efforts that will bring the silos into the whole.

Learning the Right Lesson

Data Governance is not a onetime event. Data must be consistently viewed as an asset and the culture must recognize and support the continuing protection of the Data Governance processes. With the right sponsor, the evolutionary process to enable an ongoing governed approach to data quality and protection will provide significant benefits both for the present and the future. To enable that future, however, the culture must change and adapt to one that recognizes the value of the data and as a result, embraces a Data Governance focused mindset.

Before academia can learn the lessons that the data assets provide, they have to build the foundational knowledge required. Data Governance provides that foundation.

Data Governance Resources:

The DAMA Guide to the Data Management Body of Knowledge

IBM Data Governance Council Maturity Model

See all articles by Keesa Bond 

Keesa Bond
Keesa Bond
Keesa Bond describes her technical interests as being those of an investigative Data Scientist. Throughout her career in academia, Keesa has found creative ways to use technology to make data more meaningful for those it should serve. She knows that the stories that data can provide are there, but realizes that data inaccuracies often invalidate the story’s ending. Her search for data validity has led her to the realization that Data Governance is foundational to ensure accurate, reliable, actionable data.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles