SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Validator and preview 
for the JobPosting data model 
of Schema.org 
Jindřich Mynarz 
Department of Information and Knowledge 
Engineering, 
University of Economics, Prague 
EC-WEB 2014, September 2, 2014
Motivation 
● Improving usability of vocabularies 
● Provide feedback on the use of 
vocabularies 
● Make vocabulary specification executable 
● Help ensure basic level of data quality 
● Capture application-specific requirements 
for data in validation rules
DámePráci.eu project 
“Matching jobs with unemployed 
through semantic data” 
Data model using Schema.org with 
an extension for the job market. 
Application for searching through job postings 
aggregated from distinct sources: 
www.damepraci.cz (in Czech)
Validation method 
● Rule-based, schema-aware 
validation 
● Operates in the RDF data model 
● Focuses on semantic errors, beyond well-formed 
markup 
● Partial open world assumption 
● Implemented as SPARQL 1.1 CONSTRUCT 
queries 
● Error reporting via SPIN RDF vocabulary
Background knowledge 
schema.org 
+ extension for job market (RDFS) 
+ external enumerations: 
● ISO 4217 currency codes (SKOS) 
● ISO 639-1 language codes (SKOS) 
Loaded in separate named graphs that the 
validation rules can reference.
Validation rules 
● Data completeness 
● Distinction between datatype and object 
properties 
● Conflicting data 
● Datatype violations 
● Invalid codes
Data completeness 
● At least 1 instance 
of schema:JobPosting 
● Other type information (class membership, 
datatypes) left optional 
● Empty literals 
● Conditionally required data (e.g., 
compensation + currency)
Distinction between datatype 
and object properties 
● Object properties with literal objects instead 
of URIs or blank nodes (and vice versa for 
datatype properties) 
● Simpler syntax of datatype 
properties 
○ Avoiding nested objects or difficulties with finding an 
object's URI 
● May be a symptom of incorrectly nested 
HTML elements
Conflicting data 
● Mutually-exclusive properties 
○ schema:jobLocation 
+ schema:isRemoteWork true 
● Cardinality violation for functional properties 
with > 1 object 
○ schema:startDate, schema:currency, schema: 
availableVacancies 
● Incompatible class membership inferences 
○ schema:domainIncludes, schema:rangeIncludes 
○ Incompatible class membership is instantiation of 2+ 
distinct classes that are not in rdfs:subClassOf 
relation.
Datatype violations 
● Regular expressions, casting errors 
of XPath datatype constructor functions 
● Date and time formats (xsd:date, xsd: 
duration) 
○ Not conforming to regular expressions 
○ Non-existent dates 
○ Dates from the future 
● Interval limits 
○ Positive integers for schema:availableVacancies
Invalid codes 
● Based on lookup in code lists enumerating 
every valid code 
● Includes language codes (ISO 639-1) and 
currency codes (ISO 4217)
Implementation 
Ruby on Rails web application 
backed by Jena Fuseki SPARQL 1.1 endpoint. 
● Validates both RDFa and HTML5 Microdata 
● Czech and English localization 
● Validation results in HTML or JSON-LD 
● RSpec tests for each validation rule 
● Open source: https://github.com/OPLZZ/job-posting-validator
Demo: bit.ly/broken-job-posting
Preview
Experimental validation 
of a JobPosting corpus 
● 1332 seed URLs from 752 distinct 
pay-level domains obtained via Google 
Custom Search Engine restricted to schema: 
JobPosting 
● Sample of 42 872 web pages obtained 
by crawling seed URLs 
● Each page validated, validation results 
in JSON-LD loaded to Elasticsearch 
for exploration
Most common errors
Datatype property used 
as object property 
Most common path to error: schema:title 
Possible cause: incorrect understanding of 
markup precedence rules: 
<a property="title" href="#title">SEO guru</a> 
[] schema:title <#title> . 
[] schema:title "SEO guru" .
Empty literal value 
Most common path to error: schema: 
addressRegion 
Possible cause: incomplete data used to 
generate HTML from fixed templates 
Less common in manually marked-up HTML
Incorrect character case 
in schema:Postaladdress 
Both RDFa and HTML5 Microdata are case-sensitive. 
Spread across 116 unique PLDs. 
“The default mode of authoring [Schema.org 
markup] is copy and edit.” — R.V. Guha
Object property used 
as datatype property 
Most common path to error: schema:jobLocation 
Common cause: simpler markup without intermediate 
resources 
<p property="jobLocation"> 
<p rel="jobLocation"> 
Munich 
<p rel="address"> 
</p> 
<p property= 
"addressLocality"> 
Munich 
</p> 
</p> 
</p>
Unsuccessful experiments 
Web Data Commons 
● Errors smoothed by extraction to RDF 
● Not suitable as a source of seed URLs: job 
postings disappear quickly 
Veterans Job Bank 
● Data from few PLDs, lacks variety 
● Severe restrictions on automated downloads 
through its API
Questions? 
Acknowledgements: 
The presented research was partially supported by the project 
of Operational Programme Human Resources and Employment no. CZ. 
1.04/5.1.01/77.00440. 
Image credits: 
Check List designed by Arthur Shlain from the thenounproject.com 
Puzzle designed by John from the thenounproject.com

Más contenido relacionado

La actualidad más candente

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsAlan Dix
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DLAndrea Nuzzolese
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphsSören Auer
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataGraph-TA
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsArmin Haller
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefineOpen Knowledge Belgium
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)Ivan Ermilov
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureMichele Pasin
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communicationSören Auer
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelRobert Sanderson
 

La actualidad más candente (20)

The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual AnalyticsSemantic Web Challenges for Visualisation and Visual Analytics
Semantic Web Challenges for Visualisation and Visual Analytics
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Freire model api
Freire model apiFreire model api
Freire model api
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Deriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF DataDeriving an Emergent Relational Schema from RDF Data
Deriving an Emergent Relational Schema from RDF Data
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Let your data shine... with OpenRefine
Let your data shine... with OpenRefineLet your data shine... with OpenRefine
Let your data shine... with OpenRefine
 
LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)LODStats (Presentation for KESW2013 System Demo)
LODStats (Presentation for KESW2013 System Demo)
 
Linked Data Experiences at Springer Nature
Linked Data Experiences at Springer NatureLinked Data Experiences at Springer Nature
Linked Data Experiences at Springer Nature
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Annotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation ModelAnnotating Scholarly Works - the W3C Open Annotation Model
Annotating Scholarly Works - the W3C Open Annotation Model
 

Destacado

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crashTiago Malheiros
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a BossInês Silva
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupIvo Gomes
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachTomé Duarte
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Nuno Freitas
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Fábio Oliveira
 
Apresentação
ApresentaçãoApresentação
ApresentaçãoPedro Bré
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...Marta Pinto
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Pedro Moura
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Nuno Rosa
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakPedro Oliveira
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech productsSérgio Santos
 

Destacado (20)

Apresentaçao swing crash
Apresentaçao swing crashApresentaçao swing crash
Apresentaçao swing crash
 
Pensar Digital
Pensar DigitalPensar Digital
Pensar Digital
 
Pitch Like a Boss
Pitch Like a BossPitch Like a Boss
Pitch Like a Boss
 
Agent Eighteen 2010 Mockup
Agent Eighteen 2010 MockupAgent Eighteen 2010 Mockup
Agent Eighteen 2010 Mockup
 
Bash Introduction
Bash IntroductionBash Introduction
Bash Introduction
 
Prosolvers CH
Prosolvers CHProsolvers CH
Prosolvers CH
 
Incubate Camp 2nd
Incubate Camp 2ndIncubate Camp 2nd
Incubate Camp 2nd
 
Funding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approachFunding ideas in a globally connected world – a social approach
Funding ideas in a globally connected world – a social approach
 
Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012Talk ja ye-nuno_freitas_1set2012
Talk ja ye-nuno_freitas_1set2012
 
Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016Scheduled releases @ Commit Porto 2016
Scheduled releases @ Commit Porto 2016
 
Set n'match
Set n'matchSet n'match
Set n'match
 
GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0GoClapp Pitch deck v2.0
GoClapp Pitch deck v2.0
 
Apresentação
ApresentaçãoApresentação
Apresentação
 
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
From LIBEs’ framework to users experience of LIBE courses: analysing the Port...
 
Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)Delivering presentations - dicas de apresentação (not!)
Delivering presentations - dicas de apresentação (not!)
 
Niiiws short
Niiiws short Niiiws short
Niiiws short
 
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
Barriers to the diffusion of the VSM (Nuno Rosa, 2016)
 
Customer Development - Entrepreneurs Break
Customer Development - Entrepreneurs BreakCustomer Development - Entrepreneurs Break
Customer Development - Entrepreneurs Break
 
Launching tech products
Launching tech productsLaunching tech products
Launching tech products
 
Beta start @ beside
Beta start @ besideBeta start @ beside
Beta start @ beside
 

Similar a EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Maciek Próchniak
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.Shyjal Raazi
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsRomanaPernischov
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StoryNathanial McConnell
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...DataScienceConferenc1
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In PerlKonstantin Ivinsky
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika IntroductionJosef Hardi
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 

Similar a EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org (20)

Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
Semantic framework for web scraping.
Semantic framework for web scraping.Semantic framework for web scraping.
Semantic framework for web scraping.
 
API
APIAPI
API
 
Stream processing: The Matrix Revolutions
Stream processing: The Matrix RevolutionsStream processing: The Matrix Revolutions
Stream processing: The Matrix Revolutions
 
SELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the StorySELF - Becoming a Rails Developer - The Rest of the Story
SELF - Becoming a Rails Developer - The Rest of the Story
 
Chado-XML
Chado-XMLChado-XML
Chado-XML
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
JSON-LD Update
JSON-LD UpdateJSON-LD Update
JSON-LD Update
 
Linked services
Linked servicesLinked services
Linked services
 
JSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge GraphsJSON-LD and SHACL for Knowledge Graphs
JSON-LD and SHACL for Knowledge Graphs
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
 
L18 Object Relational Mapping
L18 Object Relational MappingL18 Object Relational Mapping
L18 Object Relational Mapping
 
Semantika Introduction
Semantika IntroductionSemantika Introduction
Semantika Introduction
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
NLP and the Web
NLP and the WebNLP and the Web
NLP and the Web
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org

  • 1. Validator and preview for the JobPosting data model of Schema.org Jindřich Mynarz Department of Information and Knowledge Engineering, University of Economics, Prague EC-WEB 2014, September 2, 2014
  • 2. Motivation ● Improving usability of vocabularies ● Provide feedback on the use of vocabularies ● Make vocabulary specification executable ● Help ensure basic level of data quality ● Capture application-specific requirements for data in validation rules
  • 3. DámePráci.eu project “Matching jobs with unemployed through semantic data” Data model using Schema.org with an extension for the job market. Application for searching through job postings aggregated from distinct sources: www.damepraci.cz (in Czech)
  • 4. Validation method ● Rule-based, schema-aware validation ● Operates in the RDF data model ● Focuses on semantic errors, beyond well-formed markup ● Partial open world assumption ● Implemented as SPARQL 1.1 CONSTRUCT queries ● Error reporting via SPIN RDF vocabulary
  • 5. Background knowledge schema.org + extension for job market (RDFS) + external enumerations: ● ISO 4217 currency codes (SKOS) ● ISO 639-1 language codes (SKOS) Loaded in separate named graphs that the validation rules can reference.
  • 6. Validation rules ● Data completeness ● Distinction between datatype and object properties ● Conflicting data ● Datatype violations ● Invalid codes
  • 7. Data completeness ● At least 1 instance of schema:JobPosting ● Other type information (class membership, datatypes) left optional ● Empty literals ● Conditionally required data (e.g., compensation + currency)
  • 8. Distinction between datatype and object properties ● Object properties with literal objects instead of URIs or blank nodes (and vice versa for datatype properties) ● Simpler syntax of datatype properties ○ Avoiding nested objects or difficulties with finding an object's URI ● May be a symptom of incorrectly nested HTML elements
  • 9. Conflicting data ● Mutually-exclusive properties ○ schema:jobLocation + schema:isRemoteWork true ● Cardinality violation for functional properties with > 1 object ○ schema:startDate, schema:currency, schema: availableVacancies ● Incompatible class membership inferences ○ schema:domainIncludes, schema:rangeIncludes ○ Incompatible class membership is instantiation of 2+ distinct classes that are not in rdfs:subClassOf relation.
  • 10. Datatype violations ● Regular expressions, casting errors of XPath datatype constructor functions ● Date and time formats (xsd:date, xsd: duration) ○ Not conforming to regular expressions ○ Non-existent dates ○ Dates from the future ● Interval limits ○ Positive integers for schema:availableVacancies
  • 11. Invalid codes ● Based on lookup in code lists enumerating every valid code ● Includes language codes (ISO 639-1) and currency codes (ISO 4217)
  • 12. Implementation Ruby on Rails web application backed by Jena Fuseki SPARQL 1.1 endpoint. ● Validates both RDFa and HTML5 Microdata ● Czech and English localization ● Validation results in HTML or JSON-LD ● RSpec tests for each validation rule ● Open source: https://github.com/OPLZZ/job-posting-validator
  • 15. Experimental validation of a JobPosting corpus ● 1332 seed URLs from 752 distinct pay-level domains obtained via Google Custom Search Engine restricted to schema: JobPosting ● Sample of 42 872 web pages obtained by crawling seed URLs ● Each page validated, validation results in JSON-LD loaded to Elasticsearch for exploration
  • 17. Datatype property used as object property Most common path to error: schema:title Possible cause: incorrect understanding of markup precedence rules: <a property="title" href="#title">SEO guru</a> [] schema:title <#title> . [] schema:title "SEO guru" .
  • 18. Empty literal value Most common path to error: schema: addressRegion Possible cause: incomplete data used to generate HTML from fixed templates Less common in manually marked-up HTML
  • 19. Incorrect character case in schema:Postaladdress Both RDFa and HTML5 Microdata are case-sensitive. Spread across 116 unique PLDs. “The default mode of authoring [Schema.org markup] is copy and edit.” — R.V. Guha
  • 20. Object property used as datatype property Most common path to error: schema:jobLocation Common cause: simpler markup without intermediate resources <p property="jobLocation"> <p rel="jobLocation"> Munich <p rel="address"> </p> <p property= "addressLocality"> Munich </p> </p> </p>
  • 21. Unsuccessful experiments Web Data Commons ● Errors smoothed by extraction to RDF ● Not suitable as a source of seed URLs: job postings disappear quickly Veterans Job Bank ● Data from few PLDs, lacks variety ● Severe restrictions on automated downloads through its API
  • 22. Questions? Acknowledgements: The presented research was partially supported by the project of Operational Programme Human Resources and Employment no. CZ. 1.04/5.1.01/77.00440. Image credits: Check List designed by Arthur Shlain from the thenounproject.com Puzzle designed by John from the thenounproject.com