This document discusses the importance of metadata and data governance. It describes how a data catalog can consolidate metadata from various sources like a business glossary, data dictionary, and data profiling. Automating data lineage is key to harvesting metadata at scale and establishing relationships between different metadata objects. When integrated in a data catalog, metadata provides a single source of truth about an organization's data that improves data literacy and trust.
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Data Dictionary or a Business Glossary
1.
2. Amichai Fenner, Product Lead, Octopai
With over 7 years experience working as a full stack
BI expert, Amichai has expertise in BI methodology
and architecture, as well as technical skills in various
BI tools, from ETLs to Reporting and Analytics. He
currently manages Octopai’s automated data catalog.
3. Malcolm Chisholm, Ph.D., President,
Data Millennium
Thought leader, author, and speaker in data
governance and data management, Malcolm has over
25 years of experience in data-related disciplines and
has worked in a variety of sectors including finance,
manufacturing, government, pharmaceuticals,
telecoms. Malcolm has been awarded the prestigious
DAMA International Professional Achievement Award
for contributions to Master Data Management and
Reference Data Management.
5. High-Level Metadata Storage
Business Glossary
• Manage Terminology for both
Information and Data
Concepts
• Manage Definitions
• Manage Classifications
Data Dictionary
• Schema > Table > Column
Structural Metadata
• Data Profiling Information
• Data Universe Information
• Other Relational Data Objects,
e.g. Views
Data Catalog
• Information on Files, Datasets
• Information on Reports, Other
Data Assets
• Attaches definitions to data
assets
Provides Terminology and
Semantics for
Provides Data Structures
/ Profiles for
7. Data Catalogs Need Content
Time
Level of
Content
Production rollout of Data
Catalog with automation
Data Catalog based
on automation
Minimum level of
content needed for
business adoption
Data Catalog based
on user input
9. Metadata Consolidation
CUST_MSTR
CFN CMI CL
Immanuel Kant
Georg W Hegel
Customer Profile
Customer
First Name
Customer
Middle Initial
Customer
Last Name
Immanuel Kant
Georg W Hegel
Daily Customer Tracking
First Name MI Last Name
Immanuel Kant
Georg W Hegel
Business Term Synonym of Report Database Column Database Table
Customer First Name Customer Profile CFN CUST_MSTR
First Name Customer First Name Customer Daily Tracking CFN CUST_MSTR
Customer Middle Initial Customer Profile CMI CUST_MSTR
MI Customer Middle Initial Customer Daily Tracking CMI CUST_MSTR
Customer Last Name Customer Profile CL CUST_MSTR
Last Name Customer Last Name Customer Daily Tracking CL CUST_MSTR
Database
Reports
Business Glossary
Data Catalog
Functionality
Business Term Synonym of
Customer First Name
First Name Customer First Name
Customer Middle Initial
MI Customer Middle Initial
Customer Last Name
Last Name Customer Last Name Consolidated View (Data Catalog)
10. 1. How do you collect the
metadata?
2. How does all the
metadata get related
(how do you establish
relationships among it)?
3. How do you keep it
updated?
Problems
11. Technical Metadata Needs Automation
to Gather It
• The scale and complexity of data ecosystems is just too large for
human effort
• How do you find the relations among the metadata?
12. Data Lineage
Data Lineage can harvest metadata and build relationships among it
At a Very High Level
Business Glossary
• Manage Terminology for
both Information and Data
Concepts
• Manage Definitions
• Manage Classifications
Data Dictionary
• Schema > Table > Column
Structural Metadata
• Data Profiling Information
• Data Universe Information
• Other Relational Data
Objects, e.g. Views
Data Catalog
• Information on Files,
Datasets
• Information on Reports,
Other Data Assets
• Attaches definitions to data
assets
Provides
Terminology
and Semantics
for
Provides Data
Structures /
Profiles for
13. Data Traceability
• There are well understood use cases for needing to know data traceability for impact
analysis (if something is changed, what will be impacted?)
• Similarly, data lineage is also well understood (where did this data in this report come from –
especially if it seems to be in error?) Or what broke my ETL process?
• But data traceability is becoming a general data governance requirement, such as BCBS 239
where you have to prove that data in reports comes from operational environments
Risk
Data
Mart
Dataset
Processing
Environment
Risk
Reports
Manual
Adjustment
14. • Business Glossary, Data Dictionary, and Data Catalog
each have a different focus in terms of the types of
metadata they manage
• But there are relationships between them
• The Business Glossary gives business meaning to the
technical metadata, which is not otherwise
understandable by businesspeople
• Automation is needed to harvest metadata
• Data Lineage is a great way to do this and establish the
needed relationships in the metadata
• Data Lineage is essential for creating trust in the data
by providing full traceability
• The Data Catalog then becomes the place where all this
information is integrated, and becomes the 1 stop shop
to understand and collaborate about data
Conclusion
15. Lack of visibility &
control of data and
business knowledge
scattered throughout the
data eco-system
Data teams face
major challenges
16. Loss of tribal
knowledge
Main
challenges in
the data eco-
system
Inefficient use
of data & lack
of
independence
in using data
Single Source Of
Truth
Increased
pressure on
the data team
for analytics &
reports
Ever-growing
amount of
data in the
organization
18. Achieving Data Literacy
Leverage automation to create one source of the truth for your data
Data Lineage
Trace any data end-to-end
through your entire data
eco-system, in seconds.
Data Discovery
Find your data you need
anywhere in your data
eco-system, in seconds.
Data Catalog
Create company-wide
consistency with a self-
creating, self-updating
data catalog.
20. An effective data catalog will help
your users answer questions such as:
o Where should I look for my
data?
o Does this data matter?
o What does this data represent?
o Is this data relevant and
important?
o How can I use this data? Before After
Data Catalog connects all data citizens
in your eco-system
22. THANK YOU
Got any questions?
Malcolm Chisholm, President of Data Millennium
mchisholm@datamillennium.com
Amichai Fenner, Product Lead, Octopai
amichaif@octopai.com