A number of standards, specifications and techniques that have major significance in the field of data-centric networks are described throughout this work. The fundamental simplicity of many of these standards has allowed them to be built upon in both hierarchical and evolutionary ways. So, for example, the most widely exploited Hypertext Transfer Protocol (HTTP) is reliant on the TCP/IP protocol for routing messages and establishing connections, and on DNS for domain name resolution. Many other specifications apply the structures and techniques inherent in HTTP to other contexts than the one for which HTTP was designed. Noticeably, the Linked Data on the Web, combining HTTP and RDF, has provided an extremely simple and efficient paradigm to publish and connect datasets from all over the world. The number of published datasets has increased exponentially from 2007 to 2011, when the total amount of RDF statements in the “Linking Open Data Cloud” was about 31 billion. However, poor data quality, mistakenly connected resources and wrongly stated equivalences are today’s major challenges for the Linked Data community. In this work we introduce a method to rank equivalences in Linked Data. The rank provides an estimation of the likelihood of each equivalence and may help information engineers to better understand and “debug” co-references, i.e., references to entities supposed to be equivalent. Major contributions of this work include: i) a novel method to rank equivalences, based on the contextual knowledge of the topology of RDF graphs; ii) a pioneering application of recently introduced energy models for graph clustering to Linked Data; iii) a formal probabilistic approach to equivalence mining derived from classic Fellegi – Sunter’s record linkage theory.

(2013). On the likelihood of an equivalence in linked data.

On the likelihood of an equivalence in linked data

BARTOLOMEO, GIOVANNI
2013-01-01

Abstract

A number of standards, specifications and techniques that have major significance in the field of data-centric networks are described throughout this work. The fundamental simplicity of many of these standards has allowed them to be built upon in both hierarchical and evolutionary ways. So, for example, the most widely exploited Hypertext Transfer Protocol (HTTP) is reliant on the TCP/IP protocol for routing messages and establishing connections, and on DNS for domain name resolution. Many other specifications apply the structures and techniques inherent in HTTP to other contexts than the one for which HTTP was designed. Noticeably, the Linked Data on the Web, combining HTTP and RDF, has provided an extremely simple and efficient paradigm to publish and connect datasets from all over the world. The number of published datasets has increased exponentially from 2007 to 2011, when the total amount of RDF statements in the “Linking Open Data Cloud” was about 31 billion. However, poor data quality, mistakenly connected resources and wrongly stated equivalences are today’s major challenges for the Linked Data community. In this work we introduce a method to rank equivalences in Linked Data. The rank provides an estimation of the likelihood of each equivalence and may help information engineers to better understand and “debug” co-references, i.e., references to entities supposed to be equivalent. Major contributions of this work include: i) a novel method to rank equivalences, based on the contextual knowledge of the topology of RDF graphs; ii) a pioneering application of recently introduced energy models for graph clustering to Linked Data; iii) a formal probabilistic approach to equivalence mining derived from classic Fellegi – Sunter’s record linkage theory.
2013
2013/2014
Ingegneria delle telecomunicazioni e microelettronica
26.
Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
English
Tesi di dottorato
(2013). On the likelihood of an equivalence in linked data.
File in questo prodotto:
File Dimensione Formato  
bartolomeo-phd-thesis.pdf

solo utenti autorizzati

Licenza: Non specificato
Dimensione 1.01 MB
Formato Adobe PDF
1.01 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/204181
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact