Combining Ontology Matchers via Anomaly Detection

•Descargar como ODP, PDF•

0 recomendaciones•634 vistas

In ontology alignment, there is no single best performing matching algorithm for every matching problem. Thus, most modern matching systems combine several base matchers and aggregate their results into a final alignment. This combination is often based on simple voting or averaging, or uses existing matching problems for learning a combination policy in a supervised setting. In this paper, we present the COMMAND matching system, an unsupervised method for combining base matchers, which uses anomaly detection to produce an alignment from the results delivered by several base matchers. The basic idea of our approach is that in a large set of potential mapping candidates, the scarce actual mappings should be visible as anomalies against the majority of non-mappings. The approach is evaluated on different OAEI datasets and shows a competitive performance with state-of-the-art systems.

Datos y análisis

Combining Ontology Matchers
via Anomaly Detection
Alexander C. Müller and Heiko Paulheim

10/13/15 Alexander C. Müller, Heiko Paulheim 2
Motivation
• Most high-performing matching systems use multiple matchers
• How to combine multiple matchers into a single result?
• Common approaches (selection of)
– average, maximum, minimum matching score
– voting
– expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
– supervised learning
• Proposal:
– use anomaly detection as an unsupervised aggregation method

10/13/15 Alexander C. Müller, Heiko Paulheim 3
Idea
• Common definitions anomaly/outlier detection:
– Outlier or anomaly detection methods are used to “that appear to
deviate markedly from other members of the same sample", i.e.
– “that appear to be inconsistent with the remainder of the data"
• Rationale:
– for two ontologies with n and m concepts, there are nxm candidates
– the majority are non-matches
– the actual matches are a minority (that differ markedly from the rest)
– so, we should be able to identify them as outliers

10/13/15 Alexander C. Müller, Heiko Paulheim 4
Outlier Detection in a Nutshell
• Given a set of instances as feature vectors
– outlier detection assigns an outlier score to each instance
– higher outlier scores ↔ higher degree of outlierness
• Common approaches
– distance based
– density based
– clustering based
– model based

10/13/15 Alexander C. Müller, Heiko Paulheim 5
Aggregating Matchers via Anomaly Detection
• We run a set of base matchers
• Each base matcher score becomes a numerical feature
• Thus, out feature vectors consist of individual matching scores

10/13/15 Alexander C. Müller, Heiko Paulheim 6
Aggregating Matchers via Anomaly Detection
• Example from the conference dataset
– note: reduced to two dimensions!

10/13/15 Alexander C. Müller, Heiko Paulheim 7
COMMAND: Full Pipeline
• Run set of element-based matchers
– find non-correlated subset
• Run set of structure-based matchers on that subset
• Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Perform optional repair step

10/13/15 Alexander C. Müller, Heiko Paulheim 8
COMMAND: Full Pipeline

10/13/15 Alexander C. Müller, Heiko Paulheim 9
COMMAND: Full Pipeline
• Run set of element-based matchers (28 different ones)
– find non-correlated subset
• Run set of structure-based matchers (five different ones)
on that subset
– Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Normalize outlier scores
• Select mapping candidates
• Perform optional repair setp

10/13/15 Alexander C. Müller, Heiko Paulheim 10
COMMAND: Results
• Good results on biblio benchmark dataset
– up to 67% F-measure
• Median results on conference
– up to 68% F-measure
• Difficulties on anatomy dataset
– only a subset of matchers could be run for scalability reasons

10/13/15 Alexander C. Müller, Heiko Paulheim 11
Discussion and Conclusion
• Proof of Concept
– Anomaly detection is suitable
for matcher aggregation
– non-trivial combination of
matcher scores (PCA, outlier score)
– automatic selection of a suitable
subset of matchers
• Future work
– address scalability issues
– try more anomaly detection
approaches

Más contenido relacionado

Destacado

各顯神通bigblue

Marketing Digital e Redes SociaisMarcio Okabe

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений Лайфхак - Вебинары

The Best of CES 2014The Tech Cult

Social Media for Bremer BankAnn Walker Smalley

Agile Financial Times May09 EditionAgile Financial Technologies

LogroñoBegoña Garcia Diez

Originales gatos- By Oxana Zaikamaditabalnco

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...Hector Del Castillo, CPM, CPMM

Estrategias de la publicidad y la mercadotecnia.Miguel I. Robles Rico

Cuestionario de comercioshaniGarciaR

Destacado (11)

各顯神通

Marketing Digital e Redes Sociais

5 самых вкусных способов заработка в Youtube - Заработок в сети без вложений

The Best of CES 2014

Social Media for Bremer Bank

Agile Financial Times May09 Edition

Logroño

Originales gatos- By Oxana Zaika

BoldPM Insights Summary: Why Smart, Connected Devices Are Transforming Busine...

Estrategias de la publicidad y la mercadotecnia.

Cuestionario de comercio

Similar a Combining Ontology Matchers via Anomaly Detection

Introduction to simulation modelingbhupendra kumar

How is research conducted in my fieldCristian Klein

Introduction to Statistics and Probability:Shrihari Shrihari

Overview of statistical tests: Data handling and data quality (Part II)Bioinformatics and Computational Biosciences Branch

An experimental comparison of globally-optimal data de-identification algorithmsarx-deidentifier

Experimental Design for Distributed Machine Learning with Myles BakerDatabricks

Cadth 2015 c2 panel.mohsenCADTH Symposium

simulation modeling in DSSEnaam Alotaibi

steps in geographical research.pptxAsim Pt

Worked examples of sampling uncertainty evaluationGH Yeoh

DutchMLSchool 2022 - History and Developments in MLBigML, Inc

Research Design tushar chaudhari

cs1538.pptTaraLeander

mel705-15.pptDrVivekChauhan1

mel705-15.pptGurumurthy B R

Dowhy: An end-to-end library for causal inferenceAmit Sharma

Brief Introduction to the 12 Steps of Evaluation Data CleaningJennifer Morrow

AL slides.pptShehnazIslam1

6 Modelling PurposesBruce Edmonds

Financial Investments course Chapter 3.pptxMdRoniHasan

Similar a Combining Ontology Matchers via Anomaly Detection (20)

Introduction to simulation modeling

How is research conducted in my field

Introduction to Statistics and Probability:

Overview of statistical tests: Data handling and data quality (Part II)

An experimental comparison of globally-optimal data de-identification algorithms

Experimental Design for Distributed Machine Learning with Myles Baker

Cadth 2015 c2 panel.mohsen

simulation modeling in DSS

steps in geographical research.pptx

Worked examples of sampling uncertainty evaluation

DutchMLSchool 2022 - History and Developments in ML

Research Design

cs1538.ppt

mel705-15.ppt

Dowhy: An end-to-end library for causal inference

Brief Introduction to the 12 Steps of Evaluation Data Cleaning

AL slides.ppt

6 Modelling Purposes

Financial Investments course Chapter 3.pptx

Más de Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim

What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim

New Adventures in RDF2vecHeiko Paulheim

Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim

From Wikis to Knowledge GraphsHeiko Paulheim

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim

Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim

Make Embeddings Semantic Again!Heiko Paulheim

How much is a Triple?Heiko Paulheim

Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim

Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim

Towards Knowledge Graph ProfilingHeiko Paulheim

Knowledge Graphs on the WebHeiko Paulheim

Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim

Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim

Más de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...

What_do_Knowledge_Graph_Embeddings_Learn.pdf

New Adventures in RDF2vec

Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems

From Wikis to Knowledge Graphs

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block

Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...

Machine Learning & Embeddings for Large Knowledge Graphs

From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph

Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...

Make Embeddings Semantic Again!

How much is a Triple?

Machine Learning with and for Semantic Web Knowledge Graphs

Weakly Supervised Learning for Fake News Detection on Twitter

Towards Knowledge Graph Profiling

Knowledge Graphs on the Web

Data-driven Joint Debugging of the DBpedia Mappings and Ontology

Gathering Alternative Surface Forms for DBpedia Entities

Último

SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证nhjeo1gg

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16

在线办理UM毕业证迈阿密大学毕业证成绩单留信学历认证nhjeo1gg

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La

Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics

Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen

Learn How Data Science Changes Our WorldEduminds Learning

modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx

IMA MSN - Medical Students Network (2).pptxdolaknnilon

毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff

Combining Ontology Matchers via Anomaly Detection

1. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim

2. 10/13/15 Alexander C. Müller, Heiko Paulheim 2 Motivation • Most high-performing matching systems use multiple matchers • How to combine multiple matchers into a single result? • Common approaches (selection of) – average, maximum, minimum matching score – voting – expert modeled weights (0.4m1 + 0.3m2 + 0.3m3) – supervised learning • Proposal: – use anomaly detection as an unsupervised aggregation method

3. 10/13/15 Alexander C. Müller, Heiko Paulheim 3 Idea • Common definitions anomaly/outlier detection: – Outlier or anomaly detection methods are used to “that appear to deviate markedly from other members of the same sample", i.e. – “that appear to be inconsistent with the remainder of the data" • Rationale: – for two ontologies with n and m concepts, there are nxm candidates – the majority are non-matches – the actual matches are a minority (that differ markedly from the rest) – so, we should be able to identify them as outliers

4. 10/13/15 Alexander C. Müller, Heiko Paulheim 4 Outlier Detection in a Nutshell • Given a set of instances as feature vectors – outlier detection assigns an outlier score to each instance – higher outlier scores ↔ higher degree of outlierness • Common approaches – distance based – density based – clustering based – model based

5. 10/13/15 Alexander C. Müller, Heiko Paulheim 5 Aggregating Matchers via Anomaly Detection • We run a set of base matchers • Each base matcher score becomes a numerical feature • Thus, out feature vectors consist of individual matching scores

6. 10/13/15 Alexander C. Müller, Heiko Paulheim 6 Aggregating Matchers via Anomaly Detection • Example from the conference dataset – note: reduced to two dimensions!

7. 10/13/15 Alexander C. Müller, Heiko Paulheim 7 COMMAND: Full Pipeline • Run set of element-based matchers – find non-correlated subset • Run set of structure-based matchers on that subset • Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Perform optional repair step

8. 10/13/15 Alexander C. Müller, Heiko Paulheim 8 COMMAND: Full Pipeline

9. 10/13/15 Alexander C. Müller, Heiko Paulheim 9 COMMAND: Full Pipeline • Run set of element-based matchers (28 different ones) – find non-correlated subset • Run set of structure-based matchers (five different ones) on that subset – Collect all results into feature vectors • Perform dimensionality reduction – removing correlated matchers – Principal Component Analysis • Run outlier detection • Normalize outlier scores • Select mapping candidates • Perform optional repair setp

10. 10/13/15 Alexander C. Müller, Heiko Paulheim 10 COMMAND: Results • Good results on biblio benchmark dataset – up to 67% F-measure • Median results on conference – up to 68% F-measure • Difficulties on anatomy dataset – only a subset of matchers could be run for scalability reasons

11. 10/13/15 Alexander C. Müller, Heiko Paulheim 11 Discussion and Conclusion • Proof of Concept – Anomaly detection is suitable for matcher aggregation – non-trivial combination of matcher scores (PCA, outlier score) – automatic selection of a suitable subset of matchers • Future work – address scalability issues – try more anomaly detection approaches

12. Combining Ontology Matchers via Anomaly Detection Alexander C. Müller and Heiko Paulheim

Combining Ontology Matchers via Anomaly Detection

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (11)

Similar a Combining Ontology Matchers via Anomaly Detection

Similar a Combining Ontology Matchers via Anomaly Detection (20)

Más de Heiko Paulheim

Más de Heiko Paulheim (20)

Último

Último (20)

Combining Ontology Matchers via Anomaly Detection