Ontology Alignment (OA) is a complex, demanding and error-prone task, requiring the intervention of domain and Semantic Web experts. Automating the alignment process thus becomes a must-do, especially when involving large datasets, to at least produce a first input for human experts. Automated ontology alignment could benefit from the outstanding language ability of Large Language Models (LLMs), which could implicitly provide the background knowledge that has been the Achilles' heel of traditional alignment systems. However, this requires a correct evaluation of the performance of LLMs and understanding the best way to incorporate them into more specific tools. In this paper, we show that a naive prompting approach on the popular GPT-4 model could face several problems when transferred to real-world use cases. To this end, we replicated the methods of Norouzi et al. (2023), applied to the OAEI 2022 conference track, on a reference alignment between a pair of datasets (reduced versions of two popular thesauri: European Commission's EuroVoc and TESEO, from the Italian Senate of the Republic), which has never been tested in OAEI evaluation campaigns. This reference alignment has several features common to real-world use cases: it is has a larger size than those considered in the study we replicated, it is not published online and is therefore not subject to data contamination and it involves relations between concepts that are more complex than simple equivalence. The replicated methods achieved a significantly lower performance on our reference alignment than on the OAEI 2022 conference track, suggesting that size, data contamination, and semantic complexity need to be considered when using LLMs for the alignment task.
Macilenti, G., Stellato, A., Fiorelli, M. (2024). Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case. PROCEDIA COMPUTER SCIENCE, 246, 1289-1298 [10.1016/j.procs.2024.09.557].
Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case
Macilenti, Giulio
;Stellato, Armando;Fiorelli, Manuel
2024-01-01
Abstract
Ontology Alignment (OA) is a complex, demanding and error-prone task, requiring the intervention of domain and Semantic Web experts. Automating the alignment process thus becomes a must-do, especially when involving large datasets, to at least produce a first input for human experts. Automated ontology alignment could benefit from the outstanding language ability of Large Language Models (LLMs), which could implicitly provide the background knowledge that has been the Achilles' heel of traditional alignment systems. However, this requires a correct evaluation of the performance of LLMs and understanding the best way to incorporate them into more specific tools. In this paper, we show that a naive prompting approach on the popular GPT-4 model could face several problems when transferred to real-world use cases. To this end, we replicated the methods of Norouzi et al. (2023), applied to the OAEI 2022 conference track, on a reference alignment between a pair of datasets (reduced versions of two popular thesauri: European Commission's EuroVoc and TESEO, from the Italian Senate of the Republic), which has never been tested in OAEI evaluation campaigns. This reference alignment has several features common to real-world use cases: it is has a larger size than those considered in the study we replicated, it is not published online and is therefore not subject to data contamination and it involves relations between concepts that are more complex than simple equivalence. The replicated methods achieved a significantly lower performance on our reference alignment than on the OAEI 2022 conference track, suggesting that size, data contamination, and semantic complexity need to be considered when using LLMs for the alignment task.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S1877050924026036-main.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
894.61 kB
Formato
Adobe PDF
|
894.61 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


