Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

IRIS

Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.

Rodriguez, S.d., Pascual, M.s., Oletto, A., Barnabas, S., Zuidewind, P., Dobbels, E., et al. (2022). Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. PLOS ONE, 17(10), 1-13 [10.1371/journal.pone.0276116].

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

Rodriguez S. D.;Pascual M. S.;Oletto A.;Barnabas S.;Zuidewind P.;Dobbels E.;Danaviah S.;Behuhuma O.;Lain M. G.;Vaz P.;Luis S. F.;Nhampossa T.;Varela E. L.;Otwombe K.;Liberty A.;Violari A.;Maiga A. I.;Rossi P.;Giaquinto C.;Kuhn L.;Rojo P.;Tagarro A.

2022-10-14

Abstract

Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naive Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				14-ott-2022
			
	Status di pubblicazione
	
				Pubblicato
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1371/journal.pone.0276116
			
	Rilevanza
	
				Rilevanza internazionale
			
	Tipo
	
				Articolo
			
	Referee
	
				Esperti anonimi
			
	Settore disciplinare dell'articolo (valido fino a 24/06/2024)
	
				Settore MED/38
			
	Lingua del contenuto
	
				English
			
	Citazione
	
				Rodriguez, S.d., Pascual, M.s., Oletto, A., Barnabas, S., Zuidewind, P., Dobbels, E., et al. (2022). Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS. PLOS ONE, 17(10), 1-13 [10.1371/journal.pone.0276116].
			
	Tutti gli autori
	
						Rodriguez, Sd; Pascual, Ms; Oletto, A; Barnabas, S; Zuidewind, P; Dobbels, E; Danaviah, S; Behuhuma, O; Lain, Mg; Vaz, P; Luis, Sf; Nhampossa, T; Vare...espandi
						
	Tipologia
	
				Articolo su rivista
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2108/379623

Citazioni

16

31

23

social impact