SlideShare una empresa de Scribd logo
1 de 25
FAIR Data-centric
Information Architecture:
It's all fun and games until
someone uses an “I”
Ben Gardner
Data Standards & Interoperability, Data Office, Data
Science & Artificial Intelligence, R&D, AstraZeneca,
Cambridge, United Kingdom
September 2023
What is Data Centricity?
2
Data-Centricity puts
data at the centre of
the enterprise.
Applications are
optional visitors to the
data. (Data-centric manifesto)
Data-centricity involves structuring our data around the science that we do rather
than the systems that we use. It promotes data reusability over system-centric design.
Defining our data-centric transformation using FAIR metrics
3
L1
•Data is catalogued in situ
•Raw data in an
unconformed format,
external to a data lake
•Requires expert technical
and subject matter
knowledge to access and
use
L2
•Raw data catalogued and
accessible with
governance in a data lake,
unconformed.
•Requires average
technical and expert
subject matter
knowledge to use
L3
•Conformed data
catalogued and available
in a data lake in
accordance with AZ
patterns
•Data is conformed on a
system by system basis
and may not map to a
domain data model
•Controls on access at the
system level
•Requires average level
technical and subject
matter knowledge to use:
Data Scientist.
L4 – Domain
level FAIR
•Data conformed,
integrated, processed and
audited to support
specific analytics
patterns/enquiries
•Data is conformed using a
domain level data model
•Data embeds local master
and reference data
•Controls on access at the
data level
• Creation of analytics
ready 'Marts' enabling
self service analytics:
Citizen Data Scientist
L5 – Enterprise
Level FAIR
•Extend L4 further by:
•Data is described in a
cross domain data or
industry model and is
integrated in a Data Mesh
•Data embeds enterprise
master and reference
data
•Enterprise Metadata
standard applied
•URI’s and PURLs are
implemented
•Cross domain Analytics
enabled: Citizen Data
Scientist using all relevant
AZ data
L6 – Knowledge
Level FAIR
•Extend L5 further by:
•Data is fully described
using a knowledge
language or ontology
•Data is aggregated by
business concepts and
users can navigate from
concept to concept
•Automated AI enabled: AI
and semantic solutions
can act directly on data
sets without need for
interpretation
•Knowledge enabled
citizens
Evolution of System Centric Thinking Evolution of Data Centric Thinking to Knowledge Centric
Sweet spot for many AZ domains –
where deep science and/or broad
data integration is not required
Some data domains will require additional investment to support scientific
use cases. This maximises use of Enterprise data and fully supports AI
Requires strong and increasing linkage of FAIR with TRUST
System-centric Catalogs and API’s only, giving minimal FAIR capabilities
Credit to Information Architecture and Ben Gardner
The value of data centricity
4
Junk Yard
Car Boot Sale
Moroccan Street
Market
Local Market
Hyper Market
Amazon Prime
• Random stuff by car
• Ordered by seller
• Sellers have assigned pitches
• No insights about what other sellers have
• Random level of quality i.e. first edition vs
latest edition book
• Randomly distributed stuff
• Lots of effort to find stuff
• Easy to miss what you are looking for
• Only the 'owner' knows where stuff might be at best
• Grouping by product category i.e. spices
vs carpets vs etc
• Improved level of quality
• Seller can explain the provenance of the
product
• Good intentions but not really scalable
• Well organised
• Lots of categories
• Well labelled items including
provenance (made where)
• Limited produce
• Scale and organisation at the
next level
• Everything in once place
• Store guide/map
• Optimised for
society/community
• Specialist by product category
• Data driven display/groupings -
I.e. Christmas, summer vs winter;
mining shopping patterns,
context sensitivity
• Click and collect service
• Price comparison with other
retailers
• Digital, no longer physically
constrained
• Recommendation engines -
other categories/products
related to this, push
information to you vs pull
provides information on
request
• Scale of products and
variety of manufactures
• Fuzzy search - helps me find
what I might be looking for -
I'll know it when I find it
searches
• Amazon subscription
services - schedule delivery,
Dash buttons
• Alexa - AI guided
• Services - Music and video
• Market place for other
vendors
Think about your shopping
experience….
Credit to Graham Henry, Sandra McGarry, Kerstin Forsberg and Ben Gardner
Building Knowledge Graphs
A three legged stool
5
“Start with meaning”
Dave McComb, Semantic Arts
Stool built by Derek Scuffell
Ontology Architecture overview
6
A conceptual model that
provides the minimum linking
of shared entities used across
domain and application
models.
Entity Ontologies describe
individual concepts and act
as building blocks for
importing into application
ontologies.
Application Ontologies are
built to support a specific
application.
Honeycombs used to
communicate concept of
knowledge map, illustrate use
case coverage and organic
evolution.
Credit to Jon Ison
Conceptual Model Entity Ontology Application Ontology Honeycomb
Honeycomb motif inspired by Dan Gschwend - Amgen’s Enterprise Data Fabric, DCAF 2021
Anatomy of an Entity Ontology
A model of a high level concept within a knowledge domain
7
Relationships
Controlled Vocabularies
Information
(Data Properties)
Strategy to organically grow a knowledge map
Increasing the breadth, depth and complexity of questions enabled
8
Model evolves based on emerging new scientific use cases
Foundational CV’s deliver Quality all the way up
9
Silo’ed picklists
Collections
Local CV
Foundational CV
Uncoordinated list of strings
used by an application
Well designed atomic controlled
vocabulary built to a common
standard
Narrow coverage for a
domain/application
Decided by use case specific
governance, enabled by editorial
capability & appropriate tools
Well designed atomic controlled
vocabulary built to a common
standard
Broad coverage for multiple
domains/applications
Decided by SMEs, enabled by
specialist curators
A collection of terms derived
from existing atomic controlled
vocabularies that meet specific
application needs/use cases
SKOS-XL
https://pid.astrazeneca.net/ref/cvname/{ID}
Credit to Varsha Khodiyar
Controlled Vocabularies
SKOS-XL supports a multitude of labels and signifiers
A pool of lexical labels exist for each concept. They are common use OR attributed to systems and
vocabularies. AZ curators decide which one will be preferred (for AZ) and whether other labels will be
alternative or hidden. Each label should be further characterized by a signifier.
Alternative Labels
Well defined non-case variant, alternative
labels that are used for this concept. –
some may call these “synonyms”
Preferred Labels
The preferred label for AstraZeneca, that
resonates with the business vernacular
Hidden Labels
Common mis-representations (spelling
mistakes, etc) of the concept that exist and
we don’t want used by humans. Often used
to support NLP and AI activity
“ctDNA”@en
C113243
Pref Label
NCI Thesaurus
Identifier Scheme
“Circulating Tumor DNA” ->[NCIT]
Alt Label (attributed)
“circulating tumor DNA”@en-us
Alt Label
“cell-free circulating
tumour DNA”@en
Alt Label
“ctdDNA”@en
Alt Label
“cDNA” -> Spelling
Hidden Label
“cell-free circulating
tumor DNA”@en-us
Alt Label
we choose you! (as our
preferred label)
Credit to Nathalie Conte, Derek Scuffell & Jon Ison – OWG Summer project team
Data
Domain level FAIR (L4) a great first step
11
Data Catalogue
Credit to – Data Office Collibra team
Find - Data registered in Collibra and tagged with CMM
Access – Access controlled via Collibra request service
Reuse – Documentation captured against data
But we are FAR from FAIR
We MUST go further
12
Find - Data registered in Collibra
and tagged with CMM
Access – Access controlled via
Collibra request service
Reuse – Documentation captured
against data
Find - Data registered in Collibra and
tagged with CMM
Access – Access controlled via Collibra
request service
Interoperable – Data enhanced with
shared CV and PIDs
Reuse – Documentation captured
against data, includes data dictionary, etc
Data record discoverable in Collibra
Data enriched to create interoperability
Data is Machine Readable
FAIR metrics (Level 4) FAIR metrics (Level 5)
Data record discoverable in Collibra
Data Catalog
(Limited Scope)
FAIR
reference
data
FAIR
master
data
Purl
Server
Controlled
Vocabularies
Data Catalog
(Limited Scope)
Enterprise Level FAIR (L5)
A FAIRe(nough) data product
13
Data
Controlled
Vocabularies
FAIR Data Product
Findable Accessible Interoperable Reusable
Registered and
discoverable in a Data
Catalogue
Mechanism for requesting
and receiving data
The data has been aligned
to AZ standards where
they exist
Documentation describing
the constraints associated
with using the data
The PID for each instance
in the standard is included
to make the data machine
readable
Documentation describing
the data i.e. data
dictionary, schema, etc
The minimum viable FAIR Data standard should deliver
Catalogue
Data and Controlled Vocabularies
Putting Interoperability into FAIR
14
Data
Transform
Controlled Vocabularies
Term prefLabel
& PID
Dirty data Interoperable data
prefLabel PID
• Inconsistent identifiers & terms
• Column values can be concatenated
• etc
• Shared Controlled Vocabularies
• Enrich with preferred label and PIDs
• Uses common files format CSV, JSON, etc
• Is machine readable, graph enabling and
relationally world friendly
• Well documented - Data Dictionary/Data
Schema/etc
L5 FAIR Data Products benefit all
Inclusion of PIDs simplifies data integration irrespective of target data model
15
L5 FAIR Data Product
Relational
Graph
Aligning entities with controlled vocabularies is key
16
Entity
Ontology
Application
Ontology
Build from
Foundational
Vocabularies
Build from
Collections
Ontologies
(things and relationships)
Vocabularies
(terms that categorise things)
A model built from
one or more entity
ontologies and
extended as required
based on use case
Discrete models of
things within a domain
that provide the
foundational building
blocks
Atomic sets of terms that
are used to describe the
things identified within
entity ontologies
The collection of term sets
and terms from one or more
controlled vocabularies that
are required to support a use
case
Emerging CV demand drives evolution of Entity Ontologies
Critical Data Element and DDWG provide another path to enhancing our model
17
Credit to – Jon Ison, Arinjay Jadeja and Caroline Hellawell
Model and Data
Strings to things a mapping standard
18
Credit to – Jon Ison, Nicola Ellingham and Arinjay Jadeja
Knowledge Level FAIR (L6)
A FAIR data product minimises the gap between relational and graph worlds
19
L4 FAIR Data Controlled
Vocabularies
Reference Model
RDF Serialisation
Knowledge Map
L5 FAIR Data Product
• An enterprise standard
• Adds value for everyone
• Two thirds of the way to graph without
impacting non-graph consumers
Model, Controlled Vocabularies and Data
In a data-centric world the PIDs just snap the data and model together
20
Bringing it all back together
21
Controlled
Vocabularies
Reference
Model
FAIR Data
Product
Ontology Architecture
• Scalable and sustainable
• Reusable library of entity
ontologies
• Organic evolution of
reference ontology
Controlled Vocabularies
• RDM service implemented
• SKOS-XL based CV framework & workflow
designed
• Editorial controls in place
• Prioritisation & creation of CV underway
FAIR Data
• FAIR metrics goal 80%
Data FAIR by 2025
• Standards, resources,
tooling and governance
• FAIR assessment in place
• PID server and standard
patterns agreed
Omics Imaging Patient
Knowledge Map
(Data-centric)
Data is FAIR and aggregated by
business concept (semantic)
Platform Data Products
(Platform-centric)
Data is FAIR but fragmented
across domains
From System-centric to Data-centric
Reference Architecture
Application Data
(System-centric)
Data is aligned around
processes
What should I do?
Focus on utilising our Controlled Vocabularies and enriching your data
23
Collections
Local CV
Foundational CV
Catalogue your data Commit to reusing &
improving our CV estate
Enrich your data with
prefLabel & PID
Data Catalogue Controlled Vocabularies L5 FAIR Data Product
It takes village ……
24
DF+I
Daniel Roythorne
Jon Ison
Nathalie Conte
Nicola Ellingham
Arun Balaji
Induja Mohan
Arinjay Jadeja
Ben Gardner
Hans Ienasescu
Bhavna Khilnani
Michael Neylon
Mathew Woodwark
Rob Hernandez
Collaborators
Derek Scuffell
Varsha Khodiyar
Pablo Porras Millan
John Berrisford
Bijay Jassal
Rafa Jimenez
Philippe Rocca-Serra
Victor Kim
Alex Wood
Linda Zander-Balderud
Antonio Fabregat
Mark Reuter
Tom Plasterer
James Holman
Martina Devoti
Stacy Mather
Di Elvers
Colin Wood
Sandra Mc Garry
Gareth Henry
Kerstin Freberg
Calle Nordmark
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
25

Más contenido relacionado

La actualidad más candente

Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data ManagementBhavendra Chavan
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...Neo4j
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyDATAVERSITY
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureDATAVERSITY
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best PracticesDATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesDATAVERSITY
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Guillaume LE GALIARD
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...DATAVERSITY
 

La actualidad más candente (20)

Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data Strategy
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Data Quality Best Practices
Data Quality Best PracticesData Quality Best Practices
Data Quality Best Practices
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business Approaches
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 

Similar a FAIR Data-centric Information Architecture.pptx

FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Denodo
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsDenodo
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AITigerGraph
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptxirfanullahkhan64
 
Connected development data
Connected development dataConnected development data
Connected development dataRob Worthington
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...Steven Meister
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer OverviewGridlogics
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Denodo
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
 

Similar a FAIR Data-centric Information Architecture.pptx (20)

FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Domain-Driven Design
Data Domain-Driven DesignData Domain-Driven Design
Data Domain-Driven Design
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Supply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AISupply Chain and Logistics Management with Graph & AI
Supply Chain and Logistics Management with Graph & AI
 
Apache Solr vs Oracle Endeca
Apache Solr vs Oracle EndecaApache Solr vs Oracle Endeca
Apache Solr vs Oracle Endeca
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Identity and User Access Management.pptx
Identity and User Access Management.pptxIdentity and User Access Management.pptx
Identity and User Access Management.pptx
 
Connected development data
Connected development dataConnected development data
Connected development data
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 

Más de Ben Gardner

Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
 
Semantic blockchain
Semantic blockchainSemantic blockchain
Semantic blockchainBen Gardner
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalBen Gardner
 
From Search to Semantics
From Search to SemanticsFrom Search to Semantics
From Search to SemanticsBen Gardner
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introductionBen Gardner
 
From the Unknown to the Known
From the Unknown to the KnownFrom the Unknown to the Known
From the Unknown to the KnownBen Gardner
 
Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Ben Gardner
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Ben Gardner
 

Más de Ben Gardner (9)

Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
Semantic blockchain
Semantic blockchainSemantic blockchain
Semantic blockchain
 
What AI is and examples of how it is used in legal
What AI is and examples of how it is used in legalWhat AI is and examples of how it is used in legal
What AI is and examples of how it is used in legal
 
From Search to Semantics
From Search to SemanticsFrom Search to Semantics
From Search to Semantics
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introduction
 
From the Unknown to the Known
From the Unknown to the KnownFrom the Unknown to the Known
From the Unknown to the Known
 
Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?Enterprise wiki's: Does one size fit all?
Enterprise wiki's: Does one size fit all?
 
meet Jessica
meet Jessicameet Jessica
meet Jessica
 
Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)Stratergies for the intergration of information (IPI_ConfEX)
Stratergies for the intergration of information (IPI_ConfEX)
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

FAIR Data-centric Information Architecture.pptx

  • 1. FAIR Data-centric Information Architecture: It's all fun and games until someone uses an “I” Ben Gardner Data Standards & Interoperability, Data Office, Data Science & Artificial Intelligence, R&D, AstraZeneca, Cambridge, United Kingdom September 2023
  • 2. What is Data Centricity? 2 Data-Centricity puts data at the centre of the enterprise. Applications are optional visitors to the data. (Data-centric manifesto) Data-centricity involves structuring our data around the science that we do rather than the systems that we use. It promotes data reusability over system-centric design.
  • 3. Defining our data-centric transformation using FAIR metrics 3 L1 •Data is catalogued in situ •Raw data in an unconformed format, external to a data lake •Requires expert technical and subject matter knowledge to access and use L2 •Raw data catalogued and accessible with governance in a data lake, unconformed. •Requires average technical and expert subject matter knowledge to use L3 •Conformed data catalogued and available in a data lake in accordance with AZ patterns •Data is conformed on a system by system basis and may not map to a domain data model •Controls on access at the system level •Requires average level technical and subject matter knowledge to use: Data Scientist. L4 – Domain level FAIR •Data conformed, integrated, processed and audited to support specific analytics patterns/enquiries •Data is conformed using a domain level data model •Data embeds local master and reference data •Controls on access at the data level • Creation of analytics ready 'Marts' enabling self service analytics: Citizen Data Scientist L5 – Enterprise Level FAIR •Extend L4 further by: •Data is described in a cross domain data or industry model and is integrated in a Data Mesh •Data embeds enterprise master and reference data •Enterprise Metadata standard applied •URI’s and PURLs are implemented •Cross domain Analytics enabled: Citizen Data Scientist using all relevant AZ data L6 – Knowledge Level FAIR •Extend L5 further by: •Data is fully described using a knowledge language or ontology •Data is aggregated by business concepts and users can navigate from concept to concept •Automated AI enabled: AI and semantic solutions can act directly on data sets without need for interpretation •Knowledge enabled citizens Evolution of System Centric Thinking Evolution of Data Centric Thinking to Knowledge Centric Sweet spot for many AZ domains – where deep science and/or broad data integration is not required Some data domains will require additional investment to support scientific use cases. This maximises use of Enterprise data and fully supports AI Requires strong and increasing linkage of FAIR with TRUST System-centric Catalogs and API’s only, giving minimal FAIR capabilities Credit to Information Architecture and Ben Gardner
  • 4. The value of data centricity 4 Junk Yard Car Boot Sale Moroccan Street Market Local Market Hyper Market Amazon Prime • Random stuff by car • Ordered by seller • Sellers have assigned pitches • No insights about what other sellers have • Random level of quality i.e. first edition vs latest edition book • Randomly distributed stuff • Lots of effort to find stuff • Easy to miss what you are looking for • Only the 'owner' knows where stuff might be at best • Grouping by product category i.e. spices vs carpets vs etc • Improved level of quality • Seller can explain the provenance of the product • Good intentions but not really scalable • Well organised • Lots of categories • Well labelled items including provenance (made where) • Limited produce • Scale and organisation at the next level • Everything in once place • Store guide/map • Optimised for society/community • Specialist by product category • Data driven display/groupings - I.e. Christmas, summer vs winter; mining shopping patterns, context sensitivity • Click and collect service • Price comparison with other retailers • Digital, no longer physically constrained • Recommendation engines - other categories/products related to this, push information to you vs pull provides information on request • Scale of products and variety of manufactures • Fuzzy search - helps me find what I might be looking for - I'll know it when I find it searches • Amazon subscription services - schedule delivery, Dash buttons • Alexa - AI guided • Services - Music and video • Market place for other vendors Think about your shopping experience…. Credit to Graham Henry, Sandra McGarry, Kerstin Forsberg and Ben Gardner
  • 5. Building Knowledge Graphs A three legged stool 5 “Start with meaning” Dave McComb, Semantic Arts Stool built by Derek Scuffell
  • 6. Ontology Architecture overview 6 A conceptual model that provides the minimum linking of shared entities used across domain and application models. Entity Ontologies describe individual concepts and act as building blocks for importing into application ontologies. Application Ontologies are built to support a specific application. Honeycombs used to communicate concept of knowledge map, illustrate use case coverage and organic evolution. Credit to Jon Ison Conceptual Model Entity Ontology Application Ontology Honeycomb Honeycomb motif inspired by Dan Gschwend - Amgen’s Enterprise Data Fabric, DCAF 2021
  • 7. Anatomy of an Entity Ontology A model of a high level concept within a knowledge domain 7 Relationships Controlled Vocabularies Information (Data Properties)
  • 8. Strategy to organically grow a knowledge map Increasing the breadth, depth and complexity of questions enabled 8 Model evolves based on emerging new scientific use cases
  • 9. Foundational CV’s deliver Quality all the way up 9 Silo’ed picklists Collections Local CV Foundational CV Uncoordinated list of strings used by an application Well designed atomic controlled vocabulary built to a common standard Narrow coverage for a domain/application Decided by use case specific governance, enabled by editorial capability & appropriate tools Well designed atomic controlled vocabulary built to a common standard Broad coverage for multiple domains/applications Decided by SMEs, enabled by specialist curators A collection of terms derived from existing atomic controlled vocabularies that meet specific application needs/use cases SKOS-XL https://pid.astrazeneca.net/ref/cvname/{ID} Credit to Varsha Khodiyar
  • 10. Controlled Vocabularies SKOS-XL supports a multitude of labels and signifiers A pool of lexical labels exist for each concept. They are common use OR attributed to systems and vocabularies. AZ curators decide which one will be preferred (for AZ) and whether other labels will be alternative or hidden. Each label should be further characterized by a signifier. Alternative Labels Well defined non-case variant, alternative labels that are used for this concept. – some may call these “synonyms” Preferred Labels The preferred label for AstraZeneca, that resonates with the business vernacular Hidden Labels Common mis-representations (spelling mistakes, etc) of the concept that exist and we don’t want used by humans. Often used to support NLP and AI activity “ctDNA”@en C113243 Pref Label NCI Thesaurus Identifier Scheme “Circulating Tumor DNA” ->[NCIT] Alt Label (attributed) “circulating tumor DNA”@en-us Alt Label “cell-free circulating tumour DNA”@en Alt Label “ctdDNA”@en Alt Label “cDNA” -> Spelling Hidden Label “cell-free circulating tumor DNA”@en-us Alt Label we choose you! (as our preferred label) Credit to Nathalie Conte, Derek Scuffell & Jon Ison – OWG Summer project team
  • 11. Data Domain level FAIR (L4) a great first step 11 Data Catalogue Credit to – Data Office Collibra team Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Reuse – Documentation captured against data
  • 12. But we are FAR from FAIR We MUST go further 12 Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Reuse – Documentation captured against data Find - Data registered in Collibra and tagged with CMM Access – Access controlled via Collibra request service Interoperable – Data enhanced with shared CV and PIDs Reuse – Documentation captured against data, includes data dictionary, etc Data record discoverable in Collibra Data enriched to create interoperability Data is Machine Readable FAIR metrics (Level 4) FAIR metrics (Level 5) Data record discoverable in Collibra Data Catalog (Limited Scope) FAIR reference data FAIR master data Purl Server Controlled Vocabularies Data Catalog (Limited Scope)
  • 13. Enterprise Level FAIR (L5) A FAIRe(nough) data product 13 Data Controlled Vocabularies FAIR Data Product Findable Accessible Interoperable Reusable Registered and discoverable in a Data Catalogue Mechanism for requesting and receiving data The data has been aligned to AZ standards where they exist Documentation describing the constraints associated with using the data The PID for each instance in the standard is included to make the data machine readable Documentation describing the data i.e. data dictionary, schema, etc The minimum viable FAIR Data standard should deliver Catalogue
  • 14. Data and Controlled Vocabularies Putting Interoperability into FAIR 14 Data Transform Controlled Vocabularies Term prefLabel & PID Dirty data Interoperable data prefLabel PID • Inconsistent identifiers & terms • Column values can be concatenated • etc • Shared Controlled Vocabularies • Enrich with preferred label and PIDs • Uses common files format CSV, JSON, etc • Is machine readable, graph enabling and relationally world friendly • Well documented - Data Dictionary/Data Schema/etc
  • 15. L5 FAIR Data Products benefit all Inclusion of PIDs simplifies data integration irrespective of target data model 15 L5 FAIR Data Product Relational Graph
  • 16. Aligning entities with controlled vocabularies is key 16 Entity Ontology Application Ontology Build from Foundational Vocabularies Build from Collections Ontologies (things and relationships) Vocabularies (terms that categorise things) A model built from one or more entity ontologies and extended as required based on use case Discrete models of things within a domain that provide the foundational building blocks Atomic sets of terms that are used to describe the things identified within entity ontologies The collection of term sets and terms from one or more controlled vocabularies that are required to support a use case
  • 17. Emerging CV demand drives evolution of Entity Ontologies Critical Data Element and DDWG provide another path to enhancing our model 17 Credit to – Jon Ison, Arinjay Jadeja and Caroline Hellawell
  • 18. Model and Data Strings to things a mapping standard 18 Credit to – Jon Ison, Nicola Ellingham and Arinjay Jadeja
  • 19. Knowledge Level FAIR (L6) A FAIR data product minimises the gap between relational and graph worlds 19 L4 FAIR Data Controlled Vocabularies Reference Model RDF Serialisation Knowledge Map L5 FAIR Data Product • An enterprise standard • Adds value for everyone • Two thirds of the way to graph without impacting non-graph consumers
  • 20. Model, Controlled Vocabularies and Data In a data-centric world the PIDs just snap the data and model together 20
  • 21. Bringing it all back together 21 Controlled Vocabularies Reference Model FAIR Data Product Ontology Architecture • Scalable and sustainable • Reusable library of entity ontologies • Organic evolution of reference ontology Controlled Vocabularies • RDM service implemented • SKOS-XL based CV framework & workflow designed • Editorial controls in place • Prioritisation & creation of CV underway FAIR Data • FAIR metrics goal 80% Data FAIR by 2025 • Standards, resources, tooling and governance • FAIR assessment in place • PID server and standard patterns agreed
  • 22. Omics Imaging Patient Knowledge Map (Data-centric) Data is FAIR and aggregated by business concept (semantic) Platform Data Products (Platform-centric) Data is FAIR but fragmented across domains From System-centric to Data-centric Reference Architecture Application Data (System-centric) Data is aligned around processes
  • 23. What should I do? Focus on utilising our Controlled Vocabularies and enriching your data 23 Collections Local CV Foundational CV Catalogue your data Commit to reusing & improving our CV estate Enrich your data with prefLabel & PID Data Catalogue Controlled Vocabularies L5 FAIR Data Product
  • 24. It takes village …… 24 DF+I Daniel Roythorne Jon Ison Nathalie Conte Nicola Ellingham Arun Balaji Induja Mohan Arinjay Jadeja Ben Gardner Hans Ienasescu Bhavna Khilnani Michael Neylon Mathew Woodwark Rob Hernandez Collaborators Derek Scuffell Varsha Khodiyar Pablo Porras Millan John Berrisford Bijay Jassal Rafa Jimenez Philippe Rocca-Serra Victor Kim Alex Wood Linda Zander-Balderud Antonio Fabregat Mark Reuter Tom Plasterer James Holman Martina Devoti Stacy Mather Di Elvers Colin Wood Sandra Mc Garry Gareth Henry Kerstin Freberg Calle Nordmark
  • 25. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com 25