Most of existing cardinality estimation algorithms do not support natively interval queries under a sliding window model and are thereby insensitive to data recency. We present Staggered-HyperLogLog (ST-HLL), a probabilistic data structure that takes inspiration from HyperLogLog (HLL) and provides nearly continuous-time estimation of cardinality rates, rather than absolute counts. Our solution has zero-bit overhead with respect to vanilla HLL and negligible additional computational complexity. It is based on a periodic staggered reset of HLL registers and a register equalization operation at query times to compensate for staggered counting. We tested ST-HLL over both synthetic and real Internet traffic traces, showing its ability to track variations of the flow cardinality, quickly adapting to variations under non-stationary flow arrival processes. We show that for the same amount of memory footprint, our algorithm improves the accuracy up to a factor 2x with respect to the state-of-the-art solution, Sliding HLL.

Cornacchia, A., Bianchi, G., Bianco, A., Giaccone, P. (2022). Staggered HLL: near-continuous-time cardinality estimation with no overhead. COMPUTER COMMUNICATIONS, 193, 168-175 [10.1016/j.comcom.2022.06.038].

Staggered HLL: near-continuous-time cardinality estimation with no overhead

Bianchi G.;
2022-01-01

Abstract

Most of existing cardinality estimation algorithms do not support natively interval queries under a sliding window model and are thereby insensitive to data recency. We present Staggered-HyperLogLog (ST-HLL), a probabilistic data structure that takes inspiration from HyperLogLog (HLL) and provides nearly continuous-time estimation of cardinality rates, rather than absolute counts. Our solution has zero-bit overhead with respect to vanilla HLL and negligible additional computational complexity. It is based on a periodic staggered reset of HLL registers and a register equalization operation at query times to compensate for staggered counting. We tested ST-HLL over both synthetic and real Internet traffic traces, showing its ability to track variations of the flow cardinality, quickly adapting to variations under non-stationary flow arrival processes. We show that for the same amount of memory footprint, our algorithm improves the accuracy up to a factor 2x with respect to the state-of-the-art solution, Sliding HLL.
2022
Non pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore ING-INF/03
Settore IINF-03/A - Telecomunicazioni
English
Cardinality estimation
HyperLogLog counters
Network monitoring
Probabilistic data structures
Cornacchia, A., Bianchi, G., Bianco, A., Giaccone, P. (2022). Staggered HLL: near-continuous-time cardinality estimation with no overhead. COMPUTER COMMUNICATIONS, 193, 168-175 [10.1016/j.comcom.2022.06.038].
Cornacchia, A; Bianchi, G; Bianco, A; Giaccone, P
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/400305
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact