Time-continuous emotion estimation (e. g., arousal and valence) from spontaneous speech expressions has recently drawn increasing commercial attention. However, real-life applications of emotion recognition technology require challenging conditions, such as noise from recording devices and background environments. In this work, we introduce a novel personalized emotion prediction model validated in different noisy environments. It is performed by a three-level noise reduction algorithm: (i) data downsampling, (ii) feature synchronization, and (iii) a modified version of graph total variation. The approach has been validated on the broadly used RECOLA database with different types of noises, including convolutive and additive noise with different SNRs. The process of feature synchronization improves the concordance correlation coefficient (CCC) absolute values by 0.271 on average for arousal and 0.137 for valence. The proposed denoising approach further improves the values by 0.101 for arousal and 0.086 for valence. Finally, the proposed model considerably improves the CCC values on raw data and all types of noisy data and outperforms the standard denoising methods.

Jing, S., Mao, X., Chen, L., Comes, M.c., Mencattini, A., Raguso, G., et al. (2018). A closed-form solution to the graph total variation problem for continuous emotion profiling in noisy environment. SPEECH COMMUNICATION, 104, 66-72 [10.1016/j.specom.2018.09.006].

A closed-form solution to the graph total variation problem for continuous emotion profiling in noisy environment

Mencattini A.
;
Di Natale C.;Martinelli E.
2018-01-01

Abstract

Time-continuous emotion estimation (e. g., arousal and valence) from spontaneous speech expressions has recently drawn increasing commercial attention. However, real-life applications of emotion recognition technology require challenging conditions, such as noise from recording devices and background environments. In this work, we introduce a novel personalized emotion prediction model validated in different noisy environments. It is performed by a three-level noise reduction algorithm: (i) data downsampling, (ii) feature synchronization, and (iii) a modified version of graph total variation. The approach has been validated on the broadly used RECOLA database with different types of noises, including convolutive and additive noise with different SNRs. The process of feature synchronization improves the concordance correlation coefficient (CCC) absolute values by 0.271 on average for arousal and 0.137 for valence. The proposed denoising approach further improves the values by 0.101 for arousal and 0.086 for valence. Finally, the proposed model considerably improves the CCC values on raw data and all types of noisy data and outperforms the standard denoising methods.
2018
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore ING-INF/07 - MISURE ELETTRICHE ED ELETTRONICHE
English
Continuous emotion profiling from speech
Graph total variation denoising
Noisy environment
Jing, S., Mao, X., Chen, L., Comes, M.c., Mencattini, A., Raguso, G., et al. (2018). A closed-form solution to the graph total variation problem for continuous emotion profiling in noisy environment. SPEECH COMMUNICATION, 104, 66-72 [10.1016/j.specom.2018.09.006].
Jing, S; Mao, X; Chen, L; Comes, Mc; Mencattini, A; Raguso, G; Ringeval, F; Schuller, B; Di Natale, C; Martinelli, E
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/265387
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 3
social impact