Grand challenge: Real-time analysis of social networks leveraging the Flink framework

IRIS

In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.

Marciani, G., Piu, M., Porretta, M., Nardelli, M., Cardellini, V. (2016). Grand challenge: Real-time analysis of social networks leveraging the Flink framework. In DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (pp.386-389). Association for Computing Machinery, Inc [10.1145/2933267.2933517].

Grand challenge: Real-time analysis of social networks leveraging the Flink framework

Marciani, G;Piu, M;Porretta, M;Nardelli, M;CARDELLINI, VALERIA

2016-01-01

Abstract

In this paper, we present a solution to the DEBS 2016 Grand Challenge that leverages Apache Flink, an open source platform for distributed stream and batch processing. We design the system architecture focusing on the exploitation of parallelism and memory efficiency so to enable an effective processing of high volume data streams on a distributed infrastructure. Our solution to the first query relies on a distributed and fine-grain approach for updating the post scores and determining partial ranks, which are then merged into a single final rank. Furthermore, changes in the final rank are identified so to update the output only if needed. The second query efficiently represents in-memory the evolving social graph and uses a customized Bron-Kerbosch algorithm to identify the largest communities active on a topic. We leverage on an in-memory caching system to keep the largest connected components which have been previously identified by the algorithm, thus saving computational time. The experimental results show that, on a portion of the dataset large half that provided for the Grand Challenge, our system can process up to 400 tuples/s with an average latency of 2.5 ms for the first query, and up to 370 tuples/s with an average latency of 2.7 ms for the second query.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
				10th ACM International Conference on Distributed and Event-Based Systems, DEBS 2016
			
	Luogo del convegno
	
				Irvine, CA, USA
			
	Anno del convegno
	
				2016
			
	Organizzatore/i del convegno
	
				ACM SIGMOD
			
	Rilevanza del convegno
	
				Rilevanza internazionale
			
	Data dell'intervento
	
				giu-2016
			
	Data di pubblicazione
	
				2016
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1145/2933267.2933517
			
	Settore disciplinare dell'intervento (valido fino a 24/06/2024)
	
				Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Flink; Real-time data processing; Social-network graphs;
			
	URL alternativo
	
				http://dl.acm.org/citation.cfm?id=2933517
			
	Tipologia
	
				Intervento a convegno
			
	Citazione
	
				Marciani, G., Piu, M., Porretta, M., Nardelli, M., Cardellini, V. (2016). Grand challenge: Real-time analysis of social networks leveraging the Flink framework. In DEBS 2016 - Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems (pp.386-389). Association for Computing Machinery, Inc [10.1145/2933267.2933517].
			
	Tutti gli autori
	
						Marciani, G; Piu, M; Porretta, M; Nardelli, M; Cardellini, V
					
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/172028

Citazioni

ND

4

ND

social impact