In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.

Marciani, G., Piu, M., Porretta, M., Nardelli, M., Cardellini, V. (2016). Grand challenge: Real-time analysis of social networks leveraging the Flink framework. In DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (pp.386-389). Association for Computing Machinery, Inc [10.1145/2933267.2933517].

Grand challenge: Real-time analysis of social networks leveraging the Flink framework

CARDELLINI, VALERIA
2016

Abstract

In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.
10th ACM International Conference on Distributed and Event-Based Systems, DEBS 2016
Irvine, CA, USA
2016
ACM SIGMOD
Rilevanza internazionale
giu-2016
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
English
Flink; Real-time data processing; Social-network graphs;
http://dl.acm.org/citation.cfm?id=2933517
Intervento a convegno
Marciani, G., Piu, M., Porretta, M., Nardelli, M., Cardellini, V. (2016). Grand challenge: Real-time analysis of social networks leveraging the Flink framework. In DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (pp.386-389). Association for Computing Machinery, Inc [10.1145/2933267.2933517].
Marciani, G; Piu, M; Porretta, M; Nardelli, M; Cardellini, V
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/2108/172028
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact