We first estimate the number of Italian users active on Twitter in the last year by filtering the Italian flow of Twitter. We show that our filter misses about the 6.86% of the Italian flow, while 86.80% of the selected tweets belongs to the Italian language. Given this accuracy of the Italian Twitter's Firehose filter, we are able to assess the actual number of the Italian active users (AUs) of this platform. We then introduce a massive text document clustering algorithm that is easily applicable and scalable to the Twitter social network. Instead of a topic modeling approach based on features selection and any conventional clustering algorithm, such as LDA, we apply community detection algorithms on the weighted hashtag graph . In order to scale with the graph size, we apply two linear community detection algorithms, CoDA and Louvain. Once the hashtags have been assigned to clusters, both the most numerous clusters and hashtags were associated with topics of general interest, such as sports, politics, health etc. In this way we are able to provide significant statistics of the topics covered on Twitter in the past year.

Giambattista, A., Simone, A., Antonio, C., Gianmarco, F., Giancarlo, G., Daniele, P., et al. (2021). Topic modeling by community detection algorithms. In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks. New York : ACM [10.1145/3472720.3483622].

Topic modeling by community detection algorithms

VOCCA P
2021-01-01

Abstract

We first estimate the number of Italian users active on Twitter in the last year by filtering the Italian flow of Twitter. We show that our filter misses about the 6.86% of the Italian flow, while 86.80% of the selected tweets belongs to the Italian language. Given this accuracy of the Italian Twitter's Firehose filter, we are able to assess the actual number of the Italian active users (AUs) of this platform. We then introduce a massive text document clustering algorithm that is easily applicable and scalable to the Twitter social network. Instead of a topic modeling approach based on features selection and any conventional clustering algorithm, such as LDA, we apply community detection algorithms on the weighted hashtag graph . In order to scale with the graph size, we apply two linear community detection algorithms, CoDA and Louvain. Once the hashtags have been assigned to clusters, both the most numerous clusters and hashtags were associated with topics of general interest, such as sports, politics, health etc. In this way we are able to provide significant statistics of the topics covered on Twitter in the past year.
2021
Settore INFO-01/A - Informatica
English
Rilevanza internazionale
Articolo scientifico in atti di convegno
Giambattista, A., Simone, A., Antonio, C., Gianmarco, F., Giancarlo, G., Daniele, P., et al. (2021). Topic modeling by community detection algorithms. In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks. New York : ACM [10.1145/3472720.3483622].
Giambattista, A; Simone, A; Antonio, C; Gianmarco, F; Giancarlo, G; Daniele, P; Vocca, P
Contributo in libro
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/396799
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact