Large language models provide discordant information compared to ophthalmology guidelines

IRIS

To evaluate the agreement of LLMs with the Preferred Practice Patterns® (PPP) guidelines developed by the American Academy of Ophthalmology (AAO). Open questions based on the AAO PPP were submitted to five LLMs: GPT-o1 and GPT-4o by OpenAI, Claude 3.5 Sonnet by Anthropic, Gemini 1.5 Pro by Google, and DeepSeek-R1-Lite-Preview. Questions were classified as “open” or “confirmatory with positive/negative ground-truth answer”. Three blinded investigators classified responses as “concordant”, “undetermined”, or “discordant” compared to the AAO PPP. Undetermined and discordant answers were analyzed to assess harming potential for patients. Responses referencing peer-reviewed articles were reported. In total, 147 questions were submitted to the LLMs. Concordant answers were 135 (91.8%) for GPT-o1, 133 (90.5%) for GPT-4o, 136 (92.5%) for Claude 3.5 Sonnet, 124 (84.4%) for Gemini 1.5 Pro, and 119 (81.0%) for DeepSeek-R1-Lite-Preview (P = 0.006). The highest number of harmful answers was reported for Gemini 1.5 Pro (n = 6, 4.1%), followed by DeepSeek-R1-Lite-Preview (n = 5, 3.4%). Gemini 1.5 Pro was the most transparent model (86 references, 58.5%). Other LLMs referenced papers in 9.5–15.6% of their responses. LLMs can provide discordant answers compared to ophthalmology guidelines, potentially harming patients by delaying diagnosis or recommending suboptimal treatments.

Taloni, A., Sangregorio, A.c., Alessio, G., Romeo, M.a., Coco, G., Busin, L., et al. (2025). Large language models provide discordant information compared to ophthalmology guidelines. SCIENTIFIC REPORTS, 15(1) [10.1038/s41598-025-06404-z].

Large language models provide discordant information compared to ophthalmology guidelines

Taloni, A;Sangregorio, A C;Alessio, G;Romeo, M A;Coco, G;Busin, L Marie L;Sollazzo, A;Scorcia, V;Giannaccare, Giuseppe

2025-07-01

Abstract

To evaluate the agreement of LLMs with the Preferred Practice Patterns® (PPP) guidelines developed by the American Academy of Ophthalmology (AAO). Open questions based on the AAO PPP were submitted to five LLMs: GPT-o1 and GPT-4o by OpenAI, Claude 3.5 Sonnet by Anthropic, Gemini 1.5 Pro by Google, and DeepSeek-R1-Lite-Preview. Questions were classified as “open” or “confirmatory with positive/negative ground-truth answer”. Three blinded investigators classified responses as “concordant”, “undetermined”, or “discordant” compared to the AAO PPP. Undetermined and discordant answers were analyzed to assess harming potential for patients. Responses referencing peer-reviewed articles were reported. In total, 147 questions were submitted to the LLMs. Concordant answers were 135 (91.8%) for GPT-o1, 133 (90.5%) for GPT-4o, 136 (92.5%) for Claude 3.5 Sonnet, 124 (84.4%) for Gemini 1.5 Pro, and 119 (81.0%) for DeepSeek-R1-Lite-Preview (P = 0.006). The highest number of harmful answers was reported for Gemini 1.5 Pro (n = 6, 4.1%), followed by DeepSeek-R1-Lite-Preview (n = 5, 3.4%). Gemini 1.5 Pro was the most transparent model (86 references, 58.5%). Other LLMs referenced papers in 9.5–15.6% of their responses. LLMs can provide discordant answers compared to ophthalmology guidelines, potentially harming patients by delaying diagnosis or recommending suboptimal treatments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				1-lug-2025
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1038/s41598-025-06404-z
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Esperti anonimi
			
	Settore disciplinare dell'articolo (valido dal 09/05/2024)
	
				Settore MEDS-17/A - Malattie dell'apparato visivo
			
	Lingua del contenuto
	
				English
			
	Parole chiave
	
				AAO
American Academy of Ophthalmology
Artificial intelligence
Guidelines
Large language model
Preferred practice patterns
			
	Citazione
	
				Taloni, A., Sangregorio, A.c., Alessio, G., Romeo, M.a., Coco, G., Busin, L., et al. (2025). Large language models provide discordant information compared to ophthalmology guidelines. SCIENTIFIC REPORTS, 15(1) [10.1038/s41598-025-06404-z].
			
	Tutti gli autori
	
						Taloni, A; Sangregorio, Ac; Alessio, G; Romeo, Ma; Coco, G; Busin, Lml; Sollazzo, A; Scorcia, V; Giannaccare, G
					
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
unpaywall-bitstream--987306875.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.62 MB Formato Adobe PDF Visualizza/Apri	1.62 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/459066

Citazioni

3

2

2

social impact