ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis

IRIS

Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.

Forte, G., Mauro, D., Raimondi, M., Pantano, I., Gandolfo, S., Cauli, A., et al. (2025). ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis. ANNALS OF THE RHEUMATIC DISEASES [10.1016/j.ard.2025.11.012].

ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis

Forte, Giulio;Mauro, Daniele;Raimondi, Maura;Pantano, Ilenia;Gandolfo, Saviana;Cauli, Alberto;Guggino, Giuliana;Lubrano, Ennio;Guiducci, Serena;Chimenti, Maria Sole;Peluso, Giusy;D'Agostino, Maria Antonietta;Ramonda, Roberta;Caso, Francesco;Costa, Luisa;Ruscitti, Piero;Maioli, Gabriella;Lopalco, Giuseppe;Tirri, Enrico;Caporali, Roberto;Ciccia, Francesco

2025-12-12

Abstract

Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				12-dic-2025
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.ard.2025.11.012
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Esperti anonimi
			
	Settore disciplinare dell'articolo (valido fino a 24/06/2024)
	
				Settore MED/16
			
	Settore disciplinare dell'articolo (valido dal 09/05/2024)
	
				Settore MEDS-09/C - Reumatologia
			
	Lingua del contenuto
	
				English
			
	Citazione
	
				Forte, G., Mauro, D., Raimondi, M., Pantano, I., Gandolfo, S., Cauli, A., et al. (2025). ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis. ANNALS OF THE RHEUMATIC DISEASES [10.1016/j.ard.2025.11.012].
			
	Tutti gli autori
	
						Forte, G; Mauro, D; Raimondi, M; Pantano, I; Gandolfo, S; Cauli, A; Guggino, G; Lubrano, E; Guiducci, S; Chimenti, Ms; Peluso, G; D'Agostino, Ma; Ramo...espandi
						
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/442248

Citazioni

1

ND

ND

social impact