SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment

IRIS

In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.

Macilenti, G., Stellato, A., Fiorelli, M. (2025). SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment. PROCEDIA COMPUTER SCIENCE, 270, 505-514 [10.1016/j.procs.2025.09.169].

SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment

Macilenti, Giulio;Stellato, Armando;Fiorelli, Manuel

2025-01-01

Abstract

In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2025
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.procs.2025.09.169
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Comitato scientifico
			
	Settore disciplinare dell'articolo (valido dal 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				Large Language Models
Ontology alignment
Text embeddings
			
	Citazione
	
				Macilenti, G., Stellato, A., Fiorelli, M. (2025). SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment. PROCEDIA COMPUTER SCIENCE, 270, 505-514 [10.1016/j.procs.2025.09.169].
			
	Tutti gli autori
	
						Macilenti, G; Stellato, A; Fiorelli, M
					
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S187705092502839X-main.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 881.57 kB Formato Adobe PDF Visualizza/Apri	881.57 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/443083

Citazioni

ND

0

ND

social impact