In text analysis, spherical k-means (SKM) is a specialized k-means clustering algorithm widely utilized for grouping documents represented in high-dimensional, sparse termdocument matrices, often normalized using techniques like TF-IDF. Researchers frequently seek to cluster not only documents but also the terms associated with them into coherent groups. To address this dual clustering requirement, we introduce spherical double k-means (SDKM), a novel methodology that simultaneously clusters documents and terms. This methodology offers several advantages, such as enabling more effective topic identification and keyword extraction, enhancing interpretability, computational efficiency, and efficiency in capturing dynamic changes in thematic content over time. It also facilitates the uncovering of nuanced patterns and structures of textual data.We apply SDKMto simulated and real data. The real data applications are on the corpus of US presidential inaugural addresses, spanning from George Washington in 1789 to Joe Biden in 2021, and to the 20 Newsgroups corpus. Our analysis reveals distinct clusters of words and documents that correspond to significant themes and periods, showcasing the method’s ability to facilitate a deeper understanding of the data. Our findings demonstrate the efficacy of SDKM in uncovering underlying patterns in textual data.

Bombelli, I., Iezzi, D.f., Seri, E., Vichi, M. (2026). Spherical double k-means: a co-clustering approach for textual data analysis. JOURNAL OF CLASSIFICATION [10.1007/s00357-026-09544-7].

Spherical double k-means: a co-clustering approach for textual data analysis

IEZZI D. F.;SERI E.
Methodology
;
2026-03-01

Abstract

In text analysis, spherical k-means (SKM) is a specialized k-means clustering algorithm widely utilized for grouping documents represented in high-dimensional, sparse termdocument matrices, often normalized using techniques like TF-IDF. Researchers frequently seek to cluster not only documents but also the terms associated with them into coherent groups. To address this dual clustering requirement, we introduce spherical double k-means (SDKM), a novel methodology that simultaneously clusters documents and terms. This methodology offers several advantages, such as enabling more effective topic identification and keyword extraction, enhancing interpretability, computational efficiency, and efficiency in capturing dynamic changes in thematic content over time. It also facilitates the uncovering of nuanced patterns and structures of textual data.We apply SDKMto simulated and real data. The real data applications are on the corpus of US presidential inaugural addresses, spanning from George Washington in 1789 to Joe Biden in 2021, and to the 20 Newsgroups corpus. Our analysis reveals distinct clusters of words and documents that correspond to significant themes and periods, showcasing the method’s ability to facilitate a deeper understanding of the data. Our findings demonstrate the efficacy of SDKM in uncovering underlying patterns in textual data.
mar-2026
Online ahead of print
Rilevanza internazionale
Articolo
Esperti anonimi
Settore SECS-S/05
Settore STAT-03/B - Statistica sociale
English
Con Impact Factor ISI
Textual data
Co-clustering
Topic modeling
Spherical double k-means
Bombelli, I., Iezzi, D.f., Seri, E., Vichi, M. (2026). Spherical double k-means: a co-clustering approach for textual data analysis. JOURNAL OF CLASSIFICATION [10.1007/s00357-026-09544-7].
Bombelli, I; Iezzi, Df; Seri, E; Vichi, M
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
Bombelli_et_al-2026-Journal_of_Classification.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 2.14 MB
Formato Adobe PDF
2.14 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/452743
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact