Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.

Forte, G., Mauro, D., Raimondi, M., Pantano, I., Gandolfo, S., Cauli, A., et al. (2025). ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis. ANNALS OF THE RHEUMATIC DISEASES [10.1016/j.ard.2025.11.012].

ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis

Chimenti, Maria Sole;
2025-12-12

Abstract

Objectives: Patients with rheumatic diseases frequently turn to online sources for medical information. Large language models, such as ChatGPT, may offer an accessible alternative to conventional patient‑education resources; however, their reliability remains poorly explored. We conducted an exploratory, descriptive comparison to examine whether ChatGPT-4 might provide responses comparable to those of experts. Methods: Seventy-six psoriatic arthritis (PsA) patients generated 32 questions (296 selections) grouped into 6 themes. Each question was answered by ChatGPT-4 and by 12 Italian PsA specialists (each drafted 2-3 answers). Fourteen clinicians, The 14 clinicians scored the accuracy and completeness of AI and human-generated answers, rated accuracy (1-5 Likert scale) and completeness (1-3). Interrater reliability was calculated, and mixed-effects ordinal logistic models were used to compare sources. In a separate arm, 67 PsA patients reviewed 16 randomly selected answer pairs and indicated their preference. Readability was assessed. No formal sample size calculation was performed; P values were descriptive and interpreted alongside effect sizes and 95% CIs. Results: Patients most frequently sought information on prognosis/comorbidities (54/76, 71.1%), therapy strategy (48/76, 63.2%), and treatment risks (38/76, 50.0%). Accuracy appeared comparable between ChatGPT and experts, but ChatGPT scored lower in completeness. Accuracy was lower for pregnancy/fertility, with no clear relevant differences in other domains. ChatGPT answers were chosen 491/998 times (49.2%), clinician answers 343/998 times (34.4%), and no preference 164/998 times (16.4%, P < .001), with a relative preference for ChatGPT responses in prognosis and therapy. ChatGPT responses were, on average, more readable across indices. Conclusions: In this exploratory study, ChatGPT-4 appeared able to generate accurate and readable responses to PsA-related questions and was often preferred by patients.
12-dic-2025
Pubblicato
Rilevanza internazionale
Articolo
Esperti anonimi
Settore MED/16
Settore MEDS-09/C - Reumatologia
English
Forte, G., Mauro, D., Raimondi, M., Pantano, I., Gandolfo, S., Cauli, A., et al. (2025). ChatGPT vs rheumatologists: cross-sectional study on accuracy and patient perception of AI-generated information for psoriatic arthritis. ANNALS OF THE RHEUMATIC DISEASES [10.1016/j.ard.2025.11.012].
Forte, G; Mauro, D; Raimondi, M; Pantano, I; Gandolfo, S; Cauli, A; Guggino, G; Lubrano, E; Guiducci, S; Chimenti, Ms; Peluso, G; D'Agostino, Ma; Ramo...espandi
Articolo su rivista
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/442248
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact