Modern neural networks are quite demanding regarding the size and coverage of adequate training evidences, as far as complex inferences are involved. This is the case of offensive language detection that focuses on a phenomenon, the recognition of offensive uses of language, that is elusive and multifaceted. In this scenarios gathering training data can be prohibitively expensive and the dynamics and multidimensional nature of the abusive language phenomena are also demanding of timely and evolving evidence for training in a continuous fashion. The MT-GAN-BERT approach proposed here aims to reduce the requirements of neural approaches both in terms of the amount of annotated data and the computational cost required at classification time. It focuses corresponds to a general BERT-based architecture for multi faceted text classification tasks. On the one side, MT-GAN-BERT enables semi-supervised learning for Transformers based on the Generative Adversarial Learning paradigm. It also implements a Multi-task Learning approach able to train over and solve multiple tasks, simultaneously. A single BERT-based model is used to encode the input examples, while multiple linear layers are used to implement the classification steps, with a significant reduction of the computational costs. In the experimental evaluations we studied six classification tasks related to the detection of abusive uses of language in Italian. Outcomes suggest that MT-GAN-BERT is sustainable and generally improves the raw adoption of multiple BERT-based models, with much lighter requirements in terms of annotated data and computational costs.

Breazzano, C., Croce, D., Basili, R. (2022). Multi-task and Generative Adversarial Learning for Robust and Sustainable Text Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.228-244). GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-08421-8_16].

Multi-task and Generative Adversarial Learning for Robust and Sustainable Text Classification

Croce D.;Basili R.
2022-01-01

Abstract

Modern neural networks are quite demanding regarding the size and coverage of adequate training evidences, as far as complex inferences are involved. This is the case of offensive language detection that focuses on a phenomenon, the recognition of offensive uses of language, that is elusive and multifaceted. In this scenarios gathering training data can be prohibitively expensive and the dynamics and multidimensional nature of the abusive language phenomena are also demanding of timely and evolving evidence for training in a continuous fashion. The MT-GAN-BERT approach proposed here aims to reduce the requirements of neural approaches both in terms of the amount of annotated data and the computational cost required at classification time. It focuses corresponds to a general BERT-based architecture for multi faceted text classification tasks. On the one side, MT-GAN-BERT enables semi-supervised learning for Transformers based on the Generative Adversarial Learning paradigm. It also implements a Multi-task Learning approach able to train over and solve multiple tasks, simultaneously. A single BERT-based model is used to encode the input examples, while multiple linear layers are used to implement the classification steps, with a significant reduction of the computational costs. In the experimental evaluations we studied six classification tasks related to the detection of abusive uses of language in Italian. Outcomes suggest that MT-GAN-BERT is sustainable and generally improves the raw adoption of multiple BERT-based models, with much lighter requirements in terms of annotated data and computational costs.
20th International Conference of the Italian Association for Artificial Intelligence, AIxIA 2021
Virtual Event
2021
20
Rilevanza internazionale
2022
Settore INF/01
Settore ING-INF/05
English
BERT
Generative Adversarial Learning
Multi-task learning
Semi supervised learning
Sustainable NLP
Intervento a convegno
Breazzano, C., Croce, D., Basili, R. (2022). Multi-task and Generative Adversarial Learning for Robust and Sustainable Text Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.228-244). GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-08421-8_16].
Breazzano, C; Croce, D; Basili, R
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/359275
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact