Text clustering is an unsupervised process of classifying texts and words into different groups. In literature, many algorithms use a bag of words model to represent texts and classify contents. The bag of words model assumes that word order has no signicance. The aim of this article is to propose a new method of text clustering, considering links between terms and documents. We use centrality measures to assess word/text importance in a corpus and to sequentially classify documents.
Iezzi, D. (2012). Centrality measures for text clustering. COMMUNICATIONS IN STATISTICS, THEORY AND METHODS, 41(16-17), 3179-3197 [10.1080/03610926.2011.633729].
Centrality measures for text clustering
IEZZI, DOMENICA
2012-01-01
Abstract
Text clustering is an unsupervised process of classifying texts and words into different groups. In literature, many algorithms use a bag of words model to represent texts and classify contents. The bag of words model assumes that word order has no signicance. The aim of this article is to propose a new method of text clustering, considering links between terms and documents. We use centrality measures to assess word/text importance in a corpus and to sequentially classify documents.File | Dimensione | Formato | |
---|---|---|---|
paper_communication in statistics-6.pdf
solo utenti autorizzati
Descrizione: paper
Licenza:
Copyright dell'editore
Dimensione
1.03 MB
Formato
Adobe PDF
|
1.03 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.