The ad-hoc task of the microblogging track has an important theoretical impact for Information Retrieval. A key problem in Information Retrieval is, in fact, how to compare term frequencies among documents of different length. Apparently, term frequency normalization for microblogging can be simplified because of the short length constraint for the composition of admissible messages. The shortness of messages reduces the number of admissible values for the document length, and thus the length of a message can be regarded as if it were almost small and constant. On the other hand, short messages can carry a small amount of information, so that they are hardly distinguishable from each other for content. To overcome both problems, we propose to use a precise mathematical definition of information as the one given by Shannon to provide an ad hoc IR model for Microblogging search. We show how to use Shannon’s information theory and coding theory to weight the query content in Twitter messages and retrieve relevant messages.

Amati, G., Amodeo, G., Bianchi, M., Celi, A., De Nicola, C., Flammini, M., et al. (2011). FUB, IASI-CNR, UNIVAQ at TREC 2011 Microblog track. ??????? it.cilea.surplus.oa.citation.tipologie.CitationProceedings.prensentedAt ??????? The 20th Text REtrieval Conference, Gaithersburg, MD, USA.

FUB, IASI-CNR, UNIVAQ at TREC 2011 Microblog track

GAMBOSI, GIORGIO;
2011-11-01

Abstract

The ad-hoc task of the microblogging track has an important theoretical impact for Information Retrieval. A key problem in Information Retrieval is, in fact, how to compare term frequencies among documents of different length. Apparently, term frequency normalization for microblogging can be simplified because of the short length constraint for the composition of admissible messages. The shortness of messages reduces the number of admissible values for the document length, and thus the length of a message can be regarded as if it were almost small and constant. On the other hand, short messages can carry a small amount of information, so that they are hardly distinguishable from each other for content. To overcome both problems, we propose to use a precise mathematical definition of information as the one given by Shannon to provide an ad hoc IR model for Microblogging search. We show how to use Shannon’s information theory and coding theory to weight the query content in Twitter messages and retrieve relevant messages.
The 20th Text REtrieval Conference
Gaithersburg, MD, USA
2011
20
Rilevanza internazionale
contributo
nov-2011
Settore INF/01 - INFORMATICA
English
Intervento a convegno
Amati, G., Amodeo, G., Bianchi, M., Celi, A., De Nicola, C., Flammini, M., et al. (2011). FUB, IASI-CNR, UNIVAQ at TREC 2011 Microblog track. ??????? it.cilea.surplus.oa.citation.tipologie.CitationProceedings.prensentedAt ??????? The 20th Text REtrieval Conference, Gaithersburg, MD, USA.
Amati, G; Amodeo, G; Bianchi, M; Celi, A; De Nicola, C; Flammini, M; Gaibisso, C; Gambosi, G; Marcone, G
File in questo prodotto:
File Dimensione Formato  
FUB.microblog.update.pdf

accesso aperto

Licenza: Non specificato
Dimensione 241.37 kB
Formato Adobe PDF
241.37 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/106621
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact