CorAIt is a non-native speech database for Italian, which is freely accessible online for academic research purposes. It was especially designed to meet the requirements of a larger research project focused on foreign accented Italian speech. The corpus is aimed at providing a uniform collection of speech samples uttered by non-native speakers of Italian. To date, 105 non-native speakers – whose mother tongues are either French, Romanian, Spanish, English, German, or Russian – have been recorded. The corpus includes also a control group made up of 16 Italian speakers. There are almost 8 hours of audio material, both read speech (first and second reading), and spontaneous speech. This paper emphasizes the necessity for this type of database, it describes the steps involved in its construction, and it presents the features of CorAIt.

Combei, C.r. (2017). CorAIt – A non-native speech database for Italian. In M.N. Roberto Basili (a cura di), Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017 (pp. 113-118). Torino : Accademia University Press [10.4000/books.aaccademia.2386].

CorAIt – A non-native speech database for Italian

Claudia Roberta Combei
2017-01-01

Abstract

CorAIt is a non-native speech database for Italian, which is freely accessible online for academic research purposes. It was especially designed to meet the requirements of a larger research project focused on foreign accented Italian speech. The corpus is aimed at providing a uniform collection of speech samples uttered by non-native speakers of Italian. To date, 105 non-native speakers – whose mother tongues are either French, Romanian, Spanish, English, German, or Russian – have been recorded. The corpus includes also a control group made up of 16 Italian speakers. There are almost 8 hours of audio material, both read speech (first and second reading), and spontaneous speech. This paper emphasizes the necessity for this type of database, it describes the steps involved in its construction, and it presents the features of CorAIt.
2017
Settore L-LIN/01
Settore GLOT-01/A - Glottologia e linguistica
English
Rilevanza internazionale
Articolo scientifico in atti di convegno
non-native speech database
learner corpus
L2 Italian
audio corpus
http://dx.doi.org/10.4000/books.aaccademia.2386
Combei, C.r. (2017). CorAIt – A non-native speech database for Italian. In M.N. Roberto Basili (a cura di), Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017 (pp. 113-118). Torino : Accademia University Press [10.4000/books.aaccademia.2386].
Combei, Cr
Contributo in libro
File in questo prodotto:
File Dimensione Formato  
Combei_2017_Corait_Corpus.pdf

non disponibili

Licenza: Copyright dell'editore
Dimensione 769.74 kB
Formato Adobe PDF
769.74 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/410696
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact