It’s important to note that I didn’t title this post “Implementing a Data Governance Architecture”. Data governance is not a technology space, tool or architecture. As our data governance framework illustrates, tools and architecture represents but one of many facets needed to support an enterprise data governance competency. But once you’ve defined your vision and business case with a clear approach for managing the people, process and policy facets, technology can play a significant role in determining the ultimate success or failure of your data governance efforts. Complex and poorly integrated current state architectures present a significant obstacle to applying common standards for the delivery of trusted and secure data across the enterprise. Data architects play a pivotal role in enabling data governance by designing and evangelizing the data management reference architecture to support data quality and privacy requirements. In addition, these architects must recommend enabling technologies to support data governance and stewardship workflows that aid the core processes of discovery, definition, application and measurement and monitoring (Stay tuned I’ll be sharing a lot more about these core data governance processes in a future post discussing the “Defined Processes” facet of our framework). Whatever you do, don’t fall into the all-too-common IT trap of selecting the tools before the goals, strategy and processes of data governance are in place. If you skip these steps and just try to build it, they (‘the business’) most assuredly will NOT come.
Architectural Components To Consider For Holistic Data Governance
As the graphic below represents, enterprise and data architects must consider the full lifecycle of critical enterprise data. This includes:
- Traditional upstream on-premises transactional/operational applications, systems and processes that create, update, import, or purchase data.
- Traditional downstream on-premises analytical applications, systems and processes that consolidate, reconcile, deliver and consume data.
- Exploding growth of off-premises sources and targets of data including Cloud-based applications and platforms, Social data, Mobile devices, third party data feeds, sensor data, and Hadoop analytic environments.
- Supporting data management infrastructure that enables and ensures compliance with your organization’s unique requirements for delivery of the “Right data at the right time with the right latency of the right quality and security in the right context”. Right? J
- Assessment and delivery of the Shared Capabilities that must be made available across your enterprise data architecture – and not confined within specific applications or tools. A common approach includes investments in single vendor data management platform technologies that deliver many of these capabilities through pre-packaged data services. But many also design best of breed architectures leveraging existing software and infrastructure investments and deliver many of these shared capabilities through service-oriented architectures or similar data services approaches to ensure standardization, reuse and policy compliance across their data ecosystems.
Many of our clients also ask what specific enabling software capabilities they should consider to help get their data governance efforts off the ground. Some early investments to consider include:
- Data profiling. Data profiling software helps business analysts and data stewards answer the questions “what does our data look like today”, “how does data in one system relate to data in another system” and “what rules and policies should we consider defining to improve”.
- Data discovery. While data profiling allows in-depth analysis of specified data sets, data discovery allows you to identify where any data anomaly or business scenario occurs across all data sources. Many use data discovery for the purpose of figuring out “what data do I have that is relevant to this analysis or business decision?” Example: “I’m building a 360 degree customer view. What data do I have that is relevant to the kind of customer view that I want to create?” As another example, your data privacy organization may require the ability to identify where Personally Identifiable Information is used, and how that relates to specific business rules/processes where obfuscation or data masking needs to be used.
- Business glossary. A business glossary allows your business and IT stewards to capture and share the full business context around your critical data. In addition to the expected definitions of core data entities and attributes, context can also include rules, policies, reference data, free form annotation, links, and data owners, to name a few. Many organizations simply manage their shared definitions in Word documents or spreadsheets, which typically focuses on the terms and definitions but misses the broader context. A strong packaged business glossary enables collaboration across the business and IT roles that create, approve, and consume these definitions – minimizing the risk of redundancy, definition stagnation and versioning conflicts.
- Metadata management/data lineage. The ability to reconcile and provide transparency and visibility to the supporting metadata of your most critical data is a foundational element of your data management reference architecture. Data lineage visualization and auditing capabilities also allow data architects and stewards to effectively assess impact analysis of potential changes to data definitions, rules or schemas – as well as root cause analysis capabilities when responding to a data quality or security failure. IT staff ranging from data modelers, enterprise architects, business systems analysts, developers and DBAs often manage the technical metadata, while business analysts and business stewards are often responsible for the business-oriented metadata. Ideally your business glossary should be well integrated with your metadata solution.
In addition to the above, architects should also consider where and how they want to manage their data modeling, process modeling, data quality, data privacy, master data, data monitoring and auditing, workflow management, and collaboration capabilities. In addition, they must determine how data stewards will be notified and how these stewards should mitigate exceptions to any established data quality or privacy rules.
What I’ve shared here is fairly generic, meant to be used only as a guide for enterprise architects and data governance program drivers. Every organization will need to assess its own unique current state architecture and technology capabilities, and design its optimized future state data management reference architecture based on its data governance vision and business case. While the business must own the definitions of trusted, secure data and be held accountable for the business impacts of that data – designing an effective supporting architecture with recommendations for the most appropriate enabling technologies to support all the roles within a data governance organization is a job for IT.
Hey Rob, I ve got a question that I d like your opinion on (and anyone else in the data governance community).
I m interested if people feel that retaining versioning is important for data lineage.
Here s an example.
Let s say my data lineage shows me that my report named Monthly Revenue comes from Datawarehouse A. Datawarehouse A crunches data that comes from Old Legacy System Z.
But, that was only true until today. Today, Old Legacy System Z was replaced by New System. Today, my lineage (in English) becomes New System feeds DataWarehouse A. Datawarehouse A adds 5 lines of business together and gives me Monthly Revenue Report .
My question: Next year, should I be able to go back in time and see where this change in my lineage took place?