Perceptual context based spoken language understanding for human-robot interaction

Bastianelli, E

doi:10.58015/bastianelli-emanuele_phd2016-05

In the next years, robots are expected to massively move out from industrial environments and to slowly enter our everyday life. In the specific, interactive robots are supposed to become reliable assistants for humans, especially in hostile working conditions, e.g. rescue operations in hazardous environment. To this end, providing them with human-like interaction capabilities will become a key requirement. Natural language is probably the most flexible, intuitive and expressive interface used by humans to communicate. If robots could exhibit an analogous interaction capability, they would be easily accessible to every user, making a huge step forward towards their integration in everyday life. In this direction, Spoken Language Understanding is the field that stands at the crossroad of Natural Language Processing and Audio Signal Processing. It studies how to interpret the meaning conveyed by a speech signal. Its aim is to provide machines with the capability of understanding spoken language, and it has been deeply studied during the past years in the context of Spoken Dialogue Systems, and for the realization of Personal Digital Assistants. In this Thesis we want to study the problem of Spoken Language Understanding transposed to the Human-Robot Interaction field, or more in general when it is applied in the Robotics area. To this end, we consider a wide range of Machine Learning techniques that have been found successful in previous research on Natural Language Processing and Spoken Language Understanding, and that have not been applied yet for robotic platforms. Moreover, we foster here the idea of relying on linguistically sound theories for meaning representation inside the SLU computational frameworks, in order to develop approaches that are independent from the robotic system, the application and the domain. Such generalization would lead to devising solutions that are reusable, comparable and easily adaptable. To this regard, in this Thesis we also present HuRIC, a spoken resource that has been gathered to overcome the lack of training and benchmarking data available for our research. Furthermore, the study of Spoken Language Understanding for robotic applications poses additional challenges to the research done so far. Robots, in fact, are physical entities, and they have to deal with a physical world. The language used to interact with them is in fact often grounded, as it refers to the operational environment the robot is acting into. On the other hand, the environment plays a crucial role in the understanding of grounded commands. Following psycholinguistic theories that studied how perception impacts on language understanding in human cognitive process, we explored the grounding problem of natural language and we propose a framework that make use of perceptual knowledge within the interpretation process of user commands. In such setting, grounding provides evidences that are injected inside the learning process together with linguistic aspects, in order to jointly find the interpretation that better grounds onto the current environment. To enable grounding for extracting perceptual knowledge, we exploited knowledge bases mapping features of entities in the environment over geometrical representations of the space, called semantic maps. They are knowledge bases acquired incrementally through the use of different sources of perception. We then devise a mechanism to ground language over semantic maps, that is exploited for both extracting grounded information and grounding the final interpretation in an executable command. The outcome of this study is then synthesized in a perception-enhanced Spoken Language Understanding processing pipeline for robotic applications.

Bastianelli, E. (2016). Perceptual context based spoken language understanding for human-robot interaction [10.58015/bastianelli-emanuele_phd2016-05].