In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.

Macilenti, G., Stellato, A., Fiorelli, M. (2025). SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment. PROCEDIA COMPUTER SCIENCE, 270, 505-514 [10.1016/j.procs.2025.09.169].

SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment

Macilenti, Giulio
;
Stellato, Armando;Fiorelli, Manuel
2025-01-01

Abstract

In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.
2025
Pubblicato
Rilevanza internazionale
Articolo
Comitato scientifico
Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
English
Large Language Models
Ontology alignment
Text embeddings
Macilenti, G., Stellato, A., Fiorelli, M. (2025). SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment. PROCEDIA COMPUTER SCIENCE, 270, 505-514 [10.1016/j.procs.2025.09.169].
Macilenti, G; Stellato, A; Fiorelli, M
Articolo su rivista
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S187705092502839X-main.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 881.57 kB
Formato Adobe PDF
881.57 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/443083
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact