Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case

IRIS

Ontology Alignment (OA) is a complex, demanding and error-prone task, requiring the intervention of domain and Semantic Web experts. Automating the alignment process thus becomes a must-do, especially when involving large datasets, to at least produce a first input for human experts. Automated ontology alignment could benefit from the outstanding language ability of Large Language Models (LLMs), which could implicitly provide the background knowledge that has been the Achilles' heel of traditional alignment systems. However, this requires a correct evaluation of the performance of LLMs and understanding the best way to incorporate them into more specific tools. In this paper, we show that a naive prompting approach on the popular GPT-4 model could face several problems when transferred to real-world use cases. To this end, we replicated the methods of Norouzi et al. (2023), applied to the OAEI 2022 conference track, on a reference alignment between a pair of datasets (reduced versions of two popular thesauri: European Commission's EuroVoc and TESEO, from the Italian Senate of the Republic), which has never been tested in OAEI evaluation campaigns. This reference alignment has several features common to real-world use cases: it is has a larger size than those considered in the study we replicated, it is not published online and is therefore not subject to data contamination and it involves relations between concepts that are more complex than simple equivalence. The replicated methods achieved a significantly lower performance on our reference alignment than on the OAEI 2022 conference track, suggesting that size, data contamination, and semantic complexity need to be considered when using LLMs for the alignment task.

Macilenti, G., Stellato, A., Fiorelli, M. (2024). Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case. PROCEDIA COMPUTER SCIENCE, 246, 1289-1298 [10.1016/j.procs.2024.09.557].

Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case

Macilenti, Giulio;Stellato, Armando;Fiorelli, Manuel

2024-01-01

Abstract

Ontology Alignment (OA) is a complex, demanding and error-prone task, requiring the intervention of domain and Semantic Web experts. Automating the alignment process thus becomes a must-do, especially when involving large datasets, to at least produce a first input for human experts. Automated ontology alignment could benefit from the outstanding language ability of Large Language Models (LLMs), which could implicitly provide the background knowledge that has been the Achilles' heel of traditional alignment systems. However, this requires a correct evaluation of the performance of LLMs and understanding the best way to incorporate them into more specific tools. In this paper, we show that a naive prompting approach on the popular GPT-4 model could face several problems when transferred to real-world use cases. To this end, we replicated the methods of Norouzi et al. (2023), applied to the OAEI 2022 conference track, on a reference alignment between a pair of datasets (reduced versions of two popular thesauri: European Commission's EuroVoc and TESEO, from the Italian Senate of the Republic), which has never been tested in OAEI evaluation campaigns. This reference alignment has several features common to real-world use cases: it is has a larger size than those considered in the study we replicated, it is not published online and is therefore not subject to data contamination and it involves relations between concepts that are more complex than simple equivalence. The replicated methods achieved a significantly lower performance on our reference alignment than on the OAEI 2022 conference track, suggesting that size, data contamination, and semantic complexity need to be considered when using LLMs for the alignment task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2024
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.procs.2024.09.557
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Esperti anonimi
			
	Settore disciplinare dell'articolo (valido fino a 24/06/2024)
	
				Settore INF/01
Settore ING-INF/05
			
	Settore disciplinare dell'articolo (valido dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Large language models
Ontology alignment
Semantic technologies
			
	Citazione
	
				Macilenti, G., Stellato, A., Fiorelli, M. (2024). Prompting is not all you need: evaluating GPT-4 performance on a real-world ontology alignment use case. PROCEDIA COMPUTER SCIENCE, 246, 1289-1298 [10.1016/j.procs.2024.09.557].
			
	Tutti gli autori
	
						Macilenti, G; Stellato, A; Fiorelli, M
					
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S1877050924026036-main.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 894.61 kB Formato Adobe PDF Visualizza/Apri	894.61 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/426343

Citazioni

ND

2

ND

social impact