Multi-class machine learning detection of Edema, Vocal Paralysis and Vocal Nodules through voice

IRIS

This paper aims to differentiate causes of dysphonia, namely Reinke's Edema, Vocal Cord Paralysis, and Vocal Nodules, also including healthy subjects. A proprietary dataset of 245 subjects underwent acoustic feature extraction and selection, and four classifiers were trained for multi-class classification. Loudness/Energy-related features were among the most effective, which is in line with the fact that the three diseases all cause different impairments in terms of voice volume. Cepstrum is also confirmed as an effective domain. The four classifiers obtained comparable performances, with Random Forest having the highest accuracy at 78.4% and Naïve Bayes offering the best compromise in terms of recall. Healthy subjects always lead to a higher recall, which is in line with the fact that identifying dysphonia is an easier task than differentiating among its causes.

Cesarini, V., Robotti, C., Costantini, G. (2024). Multi-class machine learning detection of Edema, Vocal Paralysis and Vocal Nodules through voice. In 2024 IEEE International Workshop on Metrology for Industry 4.0 and IoT: proceedings (pp.430-434). New York : IEEE [10.1109/MetroInd4.0IoT61288.2024.10584208].

Multi-class machine learning detection of Edema, Vocal Paralysis and Vocal Nodules through voice

Cesarini V.;Robotti C.;Costantini G.

2024-01-01

Abstract

This paper aims to differentiate causes of dysphonia, namely Reinke's Edema, Vocal Cord Paralysis, and Vocal Nodules, also including healthy subjects. A proprietary dataset of 245 subjects underwent acoustic feature extraction and selection, and four classifiers were trained for multi-class classification. Loudness/Energy-related features were among the most effective, which is in line with the fact that the three diseases all cause different impairments in terms of voice volume. Cepstrum is also confirmed as an effective domain. The four classifiers obtained comparable performances, with Random Forest having the highest accuracy at 78.4% and Naïve Bayes offering the best compromise in terms of recall. Healthy subjects always lead to a higher recall, which is in line with the fact that identifying dysphonia is an easier task than differentiating among its causes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Nome del convegno
	
				7th IEEE International Workshop on Metrology for Industry 4.0 and IoT (MetroInd4.0 and IoT 2024)
			
	Luogo del convegno
	
				Firenze, Italy
			
	Anno del convegno
	
				2024
			
	Numero del convegno
	
				7
			
	Organizzatore/i del convegno
	
				IEEE Italy Section Affinity Group of Women in Engineering
			
	Rilevanza del convegno
	
				Rilevanza internazionale
			
	Data di pubblicazione
	
				2024
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/MetroInd4.0IoT61288.2024.10584208
			
	Settore disciplinare dell'intervento (valido dal 09/05/2024)
	
				Settore IIET-01/A - Elettrotecnica
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Edema
Machine learning
Speech
Voice
			
	Tipologia
	
				Intervento a convegno
			
	Citazione
	
				Cesarini, V., Robotti, C., Costantini, G. (2024). Multi-class machine learning detection of Edema, Vocal Paralysis and Vocal Nodules through voice. In 2024 IEEE International Workshop on Metrology for Industry 4.0 and IoT: proceedings (pp.430-434). New York : IEEE [10.1109/MetroInd4.0IoT61288.2024.10584208].
			
	Tutti gli autori
	
						Cesarini, V; Robotti, C; Costantini, G
					
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/404586

Citazioni

ND

1

ND

social impact