Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications.

Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., DI NATALE, C. (2017). Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 8(3), 314-327 [10.1109/TAFFC.2016.2531664].

Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models

MENCATTINI, ARIANNA;MARTINELLI, EUGENIO;DI NATALE, CORRADO
2017-01-01

Abstract

Automatic emotion recognition from speech has been recently focused on the prediction of time-continuous dimensions (e.g., arousal and valence) of spontaneous and realistic expressions of emotion, as found in real-life interactions. However, the automatic prediction of such emotions poses several challenges, such as the subjectivity found in the definition of a gold standard from a pool of raters and the issue of data scarcity in training models. In this work, we introduce a novel emotion recognition system, based on ensemble of single-speaker-regression-models (SSRMs). The estimation of emotion is provided by combining a subset of the initial pool of SSRMs selecting those that are most concordance among them. The proposed approach allows the addition or removal of speakers from the ensemble without the necessity to re-build the entire machine learning system. The simplicity of this aggregation strategy, coupled with the flexibility assured by the modular architecture, and the promising results obtained on the RECOLA database highlight the potential implications of the proposed method in a real-life scenario and in particular in WEB-based applications.
2017
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore ING-INF/07 - MISURE ELETTRICHE ED ELETTRONICHE
English
Mencattini, A., Martinelli, E., Ringeval, F., Schuller, B., DI NATALE, C. (2017). Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 8(3), 314-327 [10.1109/TAFFC.2016.2531664].
Mencattini, A; Martinelli, E; Ringeval, F; Schuller, B; DI NATALE, C
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
07412670.pdf

solo utenti autorizzati

Licenza: Copyright dell'editore
Dimensione 1.38 MB
Formato Adobe PDF
1.38 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/185890
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 43
  • ???jsp.display-item.citation.isi??? 37
social impact