In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.
Macilenti, G., Stellato, A., Fiorelli, M. (2025). SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment. PROCEDIA COMPUTER SCIENCE, 270, 505-514 [10.1016/j.procs.2025.09.169].
SIS: leveraging semantically-informed similarity of text embeddings for enhanced ontology alignment
Macilenti, Giulio
;Stellato, Armando;Fiorelli, Manuel
2025-01-01
Abstract
In recent years, a long-standing task such as Ontology Alignment (OA) has been approached using Large Language Models (LLMs). When taking advantage of the extensive linguistic capabilities of LLMs for OA, a critical challenge arises with respect to the length of text that the model has to process. This issue is significant both because of the computational cost and because of the difficulty of the LLM in maintaining specificity and precision when processing very long prompts. To address these challenges, many existing LLM-based systems for OA use a high-recall filter to reduce the search space before applying the model as a high accuracy evaluation function. These high-recall filters are often constructed using embeddings of the textual information contained within the ontology. In this context, we propose Semantically-Informed Similarity (SIS), a novel method for extracting and comparing such textual information, leveraging all semantic relations defined as triples. Our method consists of separately embedding list of words that, for each concept, represent the objects of a given predicate. We then compute the SIS similarity as the sum of the cosine similarities of these vectors. We evaluated our method on the OAEI conference track using both SBERT and GloVe embeddings, and compared it against a baseline approach similar to those employed in existing systems. Our results show a significant performance improvement using the SIS method. In the case of SBERT, the high-recall filter achieves remarkable results, with a recall exceeding 90 percent for reasonable parameter settings. Furthermore, we show that the out-performance of our method correlates positively with the level of structuring in the ontologies to be matched.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S187705092502839X-main.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
881.57 kB
Formato
Adobe PDF
|
881.57 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


