In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En-tailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

Zanzotto, F.m., Pennacchiotti, M., Tsioutsiouliklis, K. (2011). Linguistic redundancy in Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) - (GGS Conference Ranking 1 A+) (pp.659-669). edinburgh : Association for Computational Linguistics.

Linguistic redundancy in Twitter

ZANZOTTO, FABIO MASSIMO;
2011-07-01

Abstract

In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En-tailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.
Conference on Empirical Natural Language Processing (EMNLP)
Edinburgh UK
2011
Paola Merlo
Rilevanza internazionale
contributo
2011
1-lug-2011
Settore ING-INF/05 - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI
Settore INF/01 - INFORMATICA
English
social media analytics; natural language processing
http://www.aclweb.org/anthology/D11-1061.pdf
Intervento a convegno
Zanzotto, F.m., Pennacchiotti, M., Tsioutsiouliklis, K. (2011). Linguistic redundancy in Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) - (GGS Conference Ranking 1 A+) (pp.659-669). edinburgh : Association for Computational Linguistics.
Zanzotto, Fm; Pennacchiotti, M; Tsioutsiouliklis, K
File in questo prodotto:
File Dimensione Formato  
D11-1061.pdf

accesso aperto

Licenza: Copyright dell'editore
Dimensione 304.23 kB
Formato Adobe PDF
304.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/117117
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 42
  • ???jsp.display-item.citation.isi??? ND
social impact