In ontology alignment, there is no single best performing matching algorithm for every matching problem. Thus, most modern matching systems combine several base matchers and aggregate their results into a final alignment. This combination is often based on simple voting or averaging, or uses existing matching problems for learning a combination policy in a supervised setting. In this paper, we present the COMMAND matching system, an unsupervised method for combining base matchers, which uses anomaly detection to produce an alignment from the results delivered by several base matchers. The basic idea of our approach is that in a large set of potential mapping candidates, the scarce actual mappings should be visible as anomalies against the majority of non-mappings. The approach is evaluated on different OAEI datasets and shows a competitive performance with state-of-the-art systems.
2. 10/13/15 Alexander C. Müller, Heiko Paulheim 2
Motivation
• Most high-performing matching systems use multiple matchers
• How to combine multiple matchers into a single result?
• Common approaches (selection of)
– average, maximum, minimum matching score
– voting
– expert modeled weights (0.4m1 + 0.3m2 + 0.3m3)
– supervised learning
• Proposal:
– use anomaly detection as an unsupervised aggregation method
3. 10/13/15 Alexander C. Müller, Heiko Paulheim 3
Idea
• Common definitions anomaly/outlier detection:
– Outlier or anomaly detection methods are used to “that appear to
deviate markedly from other members of the same sample", i.e.
– “that appear to be inconsistent with the remainder of the data"
• Rationale:
– for two ontologies with n and m concepts, there are nxm candidates
– the majority are non-matches
– the actual matches are a minority (that differ markedly from the rest)
– so, we should be able to identify them as outliers
4. 10/13/15 Alexander C. Müller, Heiko Paulheim 4
Outlier Detection in a Nutshell
• Given a set of instances as feature vectors
– outlier detection assigns an outlier score to each instance
– higher outlier scores ↔ higher degree of outlierness
• Common approaches
– distance based
– density based
– clustering based
– model based
5. 10/13/15 Alexander C. Müller, Heiko Paulheim 5
Aggregating Matchers via Anomaly Detection
• We run a set of base matchers
• Each base matcher score becomes a numerical feature
• Thus, out feature vectors consist of individual matching scores
6. 10/13/15 Alexander C. Müller, Heiko Paulheim 6
Aggregating Matchers via Anomaly Detection
• Example from the conference dataset
– note: reduced to two dimensions!
7. 10/13/15 Alexander C. Müller, Heiko Paulheim 7
COMMAND: Full Pipeline
• Run set of element-based matchers
– find non-correlated subset
• Run set of structure-based matchers on that subset
• Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Perform optional repair step
9. 10/13/15 Alexander C. Müller, Heiko Paulheim 9
COMMAND: Full Pipeline
• Run set of element-based matchers (28 different ones)
– find non-correlated subset
• Run set of structure-based matchers (five different ones)
on that subset
– Collect all results into feature vectors
• Perform dimensionality reduction
– removing correlated matchers
– Principal Component Analysis
• Run outlier detection
• Normalize outlier scores
• Select mapping candidates
• Perform optional repair setp
10. 10/13/15 Alexander C. Müller, Heiko Paulheim 10
COMMAND: Results
• Good results on biblio benchmark dataset
– up to 67% F-measure
• Median results on conference
– up to 68% F-measure
• Difficulties on anatomy dataset
– only a subset of matchers could be run for scalability reasons
11. 10/13/15 Alexander C. Müller, Heiko Paulheim 11
Discussion and Conclusion
• Proof of Concept
– Anomaly detection is suitable
for matcher aggregation
– non-trivial combination of
matcher scores (PCA, outlier score)
– automatic selection of a suitable
subset of matchers
• Future work
– address scalability issues
– try more anomaly detection
approaches