This study explores the application of Google NLM, an AI model uniquely trained on lecture audio, in a specialized engineering course on Production Management. The model was tested under real exam conditions and compared to the performance of 14 students from the 2022-2023 academic year. Results show that the AI consistently passed the exam, achieving an average score of 23.5/30, comparable to the student average of 23/30. While demonstrating strong consistency and factual recall, the AI struggled with numerical reasoning and applied problem-solving, particularly in inventory management and statistical decision-making. Key contributions include the first application of an audiotrained AI in engineering education and an analysis of AI performance in a highly technical domain. While not exceeding top human scores, the AI's stability suggests potential as a benchmarking tool for exam design and student assessment. Future research should explore multilingual training, hybrid audio-text learning, and domain-specific fine-tuning to enhance AI's role in academic evaluation.
Fantozzi, I.c., Martuscelli, L., Schiraldi, M.m. (2025). AI vs. human performance in university assessments: a case study in production management. In E.R. Dominik T. Matt (a cura di), Manufacturing 2030: a perspective to future challenges in industrial production (pp. 127-137). Cham : Springer [10.1007/978-3-032-03722-0_11].
AI vs. human performance in university assessments: a case study in production management
Fantozzi I. C.;Martuscelli L.;Schiraldi M. M.
2025-01-01
Abstract
This study explores the application of Google NLM, an AI model uniquely trained on lecture audio, in a specialized engineering course on Production Management. The model was tested under real exam conditions and compared to the performance of 14 students from the 2022-2023 academic year. Results show that the AI consistently passed the exam, achieving an average score of 23.5/30, comparable to the student average of 23/30. While demonstrating strong consistency and factual recall, the AI struggled with numerical reasoning and applied problem-solving, particularly in inventory management and statistical decision-making. Key contributions include the first application of an audiotrained AI in engineering education and an analysis of AI performance in a highly technical domain. While not exceeding top human scores, the AI's stability suggests potential as a benchmarking tool for exam design and student assessment. Future research should explore multilingual training, hybrid audio-text learning, and domain-specific fine-tuning to enhance AI's role in academic evaluation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


