The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.
Croce, D., Zelenanska, A., Basili, R. (2018). Neural Learning for Question Answering in Italian. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.389-402). Springer Verlag [10.1007/978-3-030-03840-3_29].
Neural Learning for Question Answering in Italian
Croce, Danilo;Basili, Roberto
2018-11-23
Abstract
The recent breakthroughs in the field of deep learning have lead to state-of-the-art results in several NLP tasks such as Question Answering (QA). Nevertheless, the training requirements in cross-linguistic settings are not satisfied: the datasets suitable for training of question answering systems for non English languages are often not available, which represents a significant barrier for most neural methods. This paper explores the possibility of acquiring a large scale although lower quality dataset for an open-domain factoid questions answering system in Italian. It consists of more than 60 thousands question-answer pairs and was used to train a system able to answer factoid questions against the Italian Wikipedia. The paper describes the dataset and the experiments, inspired by an equivalent counterpart for English. These show that results achievable for Italian are worse, even though they are already applicable to concrete QA tasks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.