A number of standards, specifications and techniques that have major significance in the field of data-centric networks are described throughout this work. The fundamental simplicity of many of these standards has allowed them to be built upon in both hierarchical and evolutionary ways. So, for example, the most widely exploited Hypertext Transfer Protocol (HTTP) is reliant on the TCP/IP protocol for routing messages and establishing connections, and on DNS for domain name resolution. Many other specifications apply the structures and techniques inherent in HTTP to other contexts than the one for which HTTP was designed. Noticeably, the Linked Data on the Web, combining HTTP and RDF, has provided an extremely simple and efficient paradigm to publish and connect datasets from all over the world. The number of published datasets has increased exponentially from 2007 to 2011, when the total amount of RDF statements in the “Linking Open Data Cloud” was about 31 billion. However, poor data quality, mistakenly connected resources and wrongly stated equivalences are today’s major challenges for the Linked Data community. In this work we introduce a method to rank equivalences in Linked Data. The rank provides an estimation of the likelihood of each equivalence and may help information engineers to better understand and “debug” co-references, i.e., references to entities supposed to be equivalent. Major contributions of this work include: i) a novel method to rank equivalences, based on the contextual knowledge of the topology of RDF graphs; ii) a pioneering application of recently introduced energy models for graph clustering to Linked Data; iii) a formal probabilistic approach to equivalence mining derived from classic Fellegi – Sunter’s record linkage theory.
(2013). On the likelihood of an equivalence in linked data.
|Titolo:||On the likelihood of an equivalence in linked data|
|Data di pubblicazione:||2013|
|Corso di dottorato:||Ingegneria delle telecomunicazioni e microelettronica|
|Settore Scientifico Disciplinare:||Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni|
|Tipologia:||Tesi di dottorato|
|Citazione:||(2013). On the likelihood of an equivalence in linked data.|
|Appare nelle tipologie:||07 - Tesi di dottorato|