We propose a highly efficient and effective algorithm to estimate metrics on very large graphs that are based on the neighborhood function: examples of such metrics are the (effective) diameter and (effective) radius or the average distance. In order to efficiently provide good approximations to the size of the neighborhood set of any node, we refer to the MinHash Signatures approach to derive compressed representations of large sparse datasets but preserving similarity. The technique introduced here, named MinHash Signature Estimation (MHSE), exploits the similarity between signatures to estimate the size of the neighborhood function. We show that MHSE is as effective as HyperANF, which is considered the state of art approach for the estimation of the effective diameter of a very large graph. Indeed, performing both parametric (t-test) and non-parametric (Wilcoxon) statistical tests on residuals for average distance, effective diameter and number of connected pairs, the p-values show that MHSE tends to be more statistically significant than HyperANF. On the other hand, we show that MHSE is a very simple and easily distributable algorithm. In addition, by the property of the signatures to preserve similarity between neighborhoods of nodes, the algorithm can be suitably applied to allow to search and estimate the overlapping size of the most similar neighborhood at different distances.

Amati, G., Angelini, S., Gambosi, G., Rossi, G., Vocca, P. (2017). Estimation of distance-based metrics for very large graphs with MinHash Signatures. In 2017 IEEE International Conference on Big Data, Big Data 2017 (pp.536-545). IEEE [10.1109/BigData.2017.8257969].

Estimation of distance-based metrics for very large graphs with MinHash Signatures

Giambattista Amati;Giorgio Gambosi;Gianluca Rossi;Paola Vocca
2017-01-01

Abstract

We propose a highly efficient and effective algorithm to estimate metrics on very large graphs that are based on the neighborhood function: examples of such metrics are the (effective) diameter and (effective) radius or the average distance. In order to efficiently provide good approximations to the size of the neighborhood set of any node, we refer to the MinHash Signatures approach to derive compressed representations of large sparse datasets but preserving similarity. The technique introduced here, named MinHash Signature Estimation (MHSE), exploits the similarity between signatures to estimate the size of the neighborhood function. We show that MHSE is as effective as HyperANF, which is considered the state of art approach for the estimation of the effective diameter of a very large graph. Indeed, performing both parametric (t-test) and non-parametric (Wilcoxon) statistical tests on residuals for average distance, effective diameter and number of connected pairs, the p-values show that MHSE tends to be more statistically significant than HyperANF. On the other hand, we show that MHSE is a very simple and easily distributable algorithm. In addition, by the property of the signatures to preserve similarity between neighborhoods of nodes, the algorithm can be suitably applied to allow to search and estimate the overlapping size of the most similar neighborhood at different distances.
5th IEEE International Conference on Big Data, Big Data 2017
usa
2017
Rilevanza internazionale
contributo
11-dic-2017
2017
Settore ING-INF/01 - ELETTRONICA
English
Approximate Estimation; Effective Diameter; HyperANF; MinHash Signature; Probabilistic Counters;
Intervento a convegno
Amati, G., Angelini, S., Gambosi, G., Rossi, G., Vocca, P. (2017). Estimation of distance-based metrics for very large graphs with MinHash Signatures. In 2017 IEEE International Conference on Big Data, Big Data 2017 (pp.536-545). IEEE [10.1109/BigData.2017.8257969].
Amati, G; Angelini, S; Gambosi, G; Rossi, G; Vocca, P
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/198297
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact