Situated natural language interactions between humans and robots are strictly necessary for complex applications: communication here implies the reference to the environment shared between a user and the robot. This paper proposes a transformer-based architecture that supports the integration of spatial information (as logical representation) about a semantic map of the environment and the input utterances. The generated interpretation is a logical form of the command that makes references to the state of the world through a single end-to-end process, stimulated at each interaction by an explicit linguistic description of the environment. In this specific work, the end-to-end capability of the targeted transformer is studied in light of its multilingual applications where the robot can be queried in different natural languages. The obtained experimental results confirm the applicability of transformers to grounded human-robotic interaction, with benefits in terms of both portability of the approach across domains and effectiveness in terms of reachable accuracy. Moreover, language-specific processing chains are shown to be preferable to large-scale multilingual models for their better trade-off between accuracy and complexity. Overall, the proposed architecture outperforms previous approaches and paves the way for sustainable multilingual architectures.

Hromei, C.d., Croce, D., Basili, R. (2023). Grounding End-to-End Pre-trained architectures for Semantic Role Labeling in multiple languages. INTELLIGENZA ARTIFICIALE, 17(2), 173-191 [10.3233/IA-230012].

Grounding End-to-End Pre-trained architectures for Semantic Role Labeling in multiple languages

Hromei C. D.;Croce D.;Basili R.
2023-01-01

Abstract

Situated natural language interactions between humans and robots are strictly necessary for complex applications: communication here implies the reference to the environment shared between a user and the robot. This paper proposes a transformer-based architecture that supports the integration of spatial information (as logical representation) about a semantic map of the environment and the input utterances. The generated interpretation is a logical form of the command that makes references to the state of the world through a single end-to-end process, stimulated at each interaction by an explicit linguistic description of the environment. In this specific work, the end-to-end capability of the targeted transformer is studied in light of its multilingual applications where the robot can be queried in different natural languages. The obtained experimental results confirm the applicability of transformers to grounded human-robotic interaction, with benefits in terms of both portability of the approach across domains and effectiveness in terms of reachable accuracy. Moreover, language-specific processing chains are shown to be preferable to large-scale multilingual models for their better trade-off between accuracy and complexity. Overall, the proposed architecture outperforms previous approaches and paves the way for sustainable multilingual architectures.
2023
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore INF/01
Settore ING-INF/05
English
End to end sequence to sequence architectures; Grounded semantic role labeling; Human-robot interaction; Italian automatic interpretation; Robotics and perception
Hromei, C.d., Croce, D., Basili, R. (2023). Grounding End-to-End Pre-trained architectures for Semantic Role Labeling in multiple languages. INTELLIGENZA ARTIFICIALE, 17(2), 173-191 [10.3233/IA-230012].
Hromei, Cd; Croce, D; Basili, R
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/359277
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact