In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En-tailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.
Zanzotto, F.m., Pennacchiotti, M., Tsioutsiouliklis, K. (2011). Linguistic redundancy in Twitter. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) - (GGS Conference Ranking 1 A+) (pp.659-669). edinburgh : Association for Computational Linguistics.
Linguistic redundancy in Twitter
ZANZOTTO, FABIO MASSIMO;
2011-07-01
Abstract
In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of micro-blogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual En-tailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.File | Dimensione | Formato | |
---|---|---|---|
D11-1061.pdf
accesso aperto
Licenza:
Copyright dell'editore
Dimensione
304.23 kB
Formato
Adobe PDF
|
304.23 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.