AI-Driven Prediction of Tuberculosis Treatment Failure in Subtropical Ecuador: A Retrospective Cohort Study Comparing Logistic Regression, Random Forest, and Artificial Neural Networks

Authors

DOI:

https://doi.org/10.70099/BJ/2026.03.01.2

Keywords:

Tuberculosis; Treatment failure; Treatment outcomes; Risk prediction; Machine learning; Logistic regression; Random forest; Artificial neural network; SMOTE; Retrospective cohort; Ecuador; Public health surveillance

Abstract

Tuberculosis (TB) remains a major global health challenge, and treatment failure continues to undermine control efforts by prolonging transmission and increasing the risk of drug resistance. This study evaluated clinical and demographic predictors of TB treatment outcomes and compared statistical and machine-learning models to predict treatment failure in a subtropical Ecuadorian setting. We conducted a retrospective cohort analysis using routinely collected program data from a Ministry of Health primary care facility (Augusto Egas Type C Health Center, Santo Domingo de los Tsáchilas, Ecuador) covering TB cases treated from 2002 to 2024. Adults aged 18–64 years were analyzed (n=922). Candidate predictors included age, sex, baseline weight, HIV screening status, TB type (pulmonary/extrapulmonary), employment status, and prior TB history (relapse, previous abandonment, previous failure, and loss to follow-up), among others. Models were trained and evaluated under two scenarios: (i) the original imbalanced dataset and (ii) a class-balanced training set generated with SMOTE to mitigate the minority-class (failure) underrepresentation. We compared binary logistic regression, random forest, and a two-hidden-layer artificial neural network using accuracy, sensitivity, specificity, precision, F1-score, balanced accuracy, Cohen’s kappa, and AUC-ROC. In the imbalanced scenario, models showed strong bias toward treatment success, yielding very low or null sensitivity for failure detection. After SMOTE balancing, logistic regression achieved the most balanced performance (accuracy 0.711; sensitivity 0.44; specificity 0.7548; precision 0.2245; F1-score 0.2973; AUC-ROC 0.6591; kappa 0.1389), outperforming random forests and neural networks in identifying failures. Overall discrimination remained moderate, indicating that additional predictive variables and external validation are needed before clinical deployment. These findings support interpretable baseline modeling as a practical foundation for risk stratification in TB programs in similar resource-constrained contexts.

References

1. World Health Organization. Estrategia Fin a la Tuberculosis: avances en la aplicación de la estrategia mundial y las metas para la prevención, la atención y el control de la tuberculosis. Informe del Director General. Consejo Ejecutivo 154.ª reunión, EB154/10 [Internet]. Ginebra: Organización Mundial de la Salud; 2023 [cited 2026 Feb 12]. Available from: https://apps.who.int/gb/ebwha/pdf_files/EB154/B154_10-sp.pdf

2. World Health Organization. Tuberculosis incidence (per 100 000 population): Ecuador [Internet]. WHO Data; [cited 2026 Feb 12]. Available from: https://data.who.int/countries/218

3. Ministerio de Salud Pública del Ecuador. Guía de práctica clínica: tamizaje y diagnóstico de la tu-berculosis [Internet]. Quito: Ministerio de Salud Pública; 2024 [cited 2026 Feb 12]. Available from: https://www.salud.gob.ec/wp-content/uploads/2024/04/GPC-tamizaje-y-diagnostico-de-tuberculosis.pdf

4. Muñoz Roca OA, Moreno Gaona AJ. Abandono al tratamiento antifímico en pacientes atendidos en un Centro de Salud Público de Guayaquil. Rev Med Investig Clin Guayaquil. 2023;4(6):9-15. doi:10.51597/rmicg.v4i6.132.

5. Culqui DR, Munayco CV, Grijalva CG, et al. Factores asociados al abandono de tratamiento an-ti-tuberculoso convencional en Perú. Arch Bronconeumol. 2012;48(5):150-5. doi:10.1016/j.arbres.2011.12.008.

6. World Health Organization. Ethics and governance of artificial intelligence for health: guidance on large multi-modal models [Internet]. Geneva: World Health Organization; 2024 [cited 2026 Feb 12]. Available from: https://iris.who.int/bitstream/handle/10665/379365/9789240084759-eng.pdf

7. Vertti J. Análisis multivariado. 1a ed. Aguascalientes; 2019.

8. Sánchez E, Tenesaca S. Modelo logístico y redes neuronales para pronóstico de anemia en menores de 5 años en el Hospital Pediátrico Alfonso Villagómez Román en el periodo 2020–2021. Riobamba; 2023.

9. Fernández A. Guía completa sobre Random Forest [Internet]. 2024 [cited 2025 Oct 27]. Available from: https://anderfernandez.com/blog/guia-completa-random-forest/

10. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85-117. doi:10.1016/j.neunet.2014.09.003.

11. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-44. doi:10.1038/nature14539.

12. Rodriguez Montero L, Ambrosio Bastián J, Pérez Sanpablo AI. Classification of Daily Living Ac-tivities in subjects with Parkinson’s Disease using Artificial Neural Networks. Rev Mex Ing Biomed. 2023;44(4):128-139. doi:10.17488/RMIB.44.4.9.

13. McClean M, Panciu TC, Lange C, et al. Artificial intelligence in tuberculosis: a new ally in disease control. Breathe. 2024;20(3):240056. doi:10.1183/20734735.0056-2024.

14. Instituto Nacional de Estadística y Censos (Ecuador). Clasificación internacional uniforme de ocupaciones (CIUO-08). Quito: INEC; 2010.

15. Briones-Hernández M, Villalba-Martínez V. Análisis factorial exploratorio del instrumento para medir el impacto del COVID-19 en estudiantes de educación superior. Rev Espacios. 2025;46(4):85. doi:10.48082/espacios-a25v46n04p08.

16. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373-9. doi:10.1016/S0895-4356(96)00236-3.

17. Ortega Calvo M, Cayuela Domínguez A. Regresión logística no condicionada y tamaño de muestra: una revisión bibliográfica. Rev Esp Salud Publica. 2002;76(2):85-93.

18. Ochoa Sangrador C, Molina Arias M, Ortega Páez E. Regresión logística múltiple. Evid Pediatr. 2023;19:34.

19. Liebl S, Lemaître G, Nogueira F. Over-sampling: Naive random over-sampling; from random over-sampling to SMOTE and ADASYN. In: imbalanced-learn documentation, version 0.14.0 [Inter-net]. 2025 [cited 2026 Feb 12]. Available from: https://imbalanced-learn.org/stable/over_sampling.html

20. Wadhwa K, Kumari R, Gosain A. Enhancing model performance in hybrid class imbalance tech-niques. Procedia Comput Sci. 2025;246:288-97. doi:10.1016/j.procs.2025.04.266.

21. Lunardon N, Menardi G, Torelli N. ROSE: a package for binary imbalanced learning. R J. 2014;6(1):79-89. doi:10.32614/RJ-2014-008.

22. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017;18(17):1-5.

23. Ortega Páez E, Ochoa Sangrador C, Molina Arias M. Regresión logística binaria simple. Evid Pediatr. 2022;18:11.

24. Piury Pinzón J, Cayuela Rodríguez L, Cayuela Domínguez A, et al. Regresión logística binaria para clínicos poco amantes de las matemáticas. NURE Investigacion. 2024;21:3-8. doi:10.58722/nure.v21i131.2553.

25. Kuhn M, Wickham H. Tidymodels: a collection of packages for modeling and machine learning using tidyverse principles. R package version 1.1.1 [Internet]. 2024 [cited 2025 Oct 23]. Available from: https://www.tidymodels.org/

26. Hosmer DW, Lemeshow S, Sturdivant RX. Applied logistic regression. 3rd ed. Hoboken (NJ): Wiley; 2013.

27. Probst P, Wright MN, Boulesteix AL. Hyperparameters and tuning strategies for random forest. WIREs Data Min Knowl Discov. 2019;9(3):e1301. doi:10.1002/widm.1301.

28. Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18-22.

29. Silfiana L, Asyifah Q, Rafika A, et al. Optimizing random forest parameters with hyperparameter tuning for classifying school-age KIP eligibility in West Java. Jambura J Math. 2025;7(1):40-48. doi:10.37905/jjom.v7i1.28736.

30. Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77(1):1-17. doi:10.18637/jss.v077.i01.

31. Nair V, Hinton GE. Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). Haifa, Israel; 2010. p. 807-14.

32. Ba JL, Kiros JR, Hinton GE. Layer normalization. arXiv [Preprint]. 2016. arXiv:1607.06450.

33. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv [Preprint]. 2014. ar-Xiv:1412.6980.

34. Abadi M, Agarwal A, Barham P, et al. TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv [Preprint]. 2016. arXiv:1603.04467.

35. Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. 3rd ed. Sebastopol (CA): O’Reilly Media; 2022.

36. Pongsuwun K, Puwarawuttipanit W, Nguantad S, et al. A systematic review of the accuracy of machine learning models for diagnosing pulmonary tuberculosis: implications for nursing practice and implementation. Nurs Health Sci. 2025;27(1):e70077. doi:10.1111/nhs.70077.

37. Guerrero DA, Código B. Aplicación de modelos machine learning para predecir el riesgo de pérdida de seguimiento en tuberculosis. Cali (Colombia); 2025.

38. Perplexity AI. Asistencia en búsqueda de artículos médico y resumen interpretativo [Internet]. 2025 [cited 2025 Oct 28]. Available from: https://www.perplexity.ai

39. Perplexity AI. Asistencia en generación de código para preprocesamiento de datos para RStudio [Internet]. 2025 [cited 2025 Oct 28]. Available from: https://www.perplexity.ai

40. Jim A, González C. La tuberculosis: una mirada desde la Atención Primaria de Salud. 2024:4-10.

41. de Lucena LA, Dantas GB da S, Carneiro TV, et al. Factors associated with the abandonment of tuberculosis treatment in Brazil: a systematic review. Rev Soc Bras Med Trop. 2023;56:e0155-2022. doi:10.1590/0037-8682-0155-2022.

42. Melo Villalobos B, Weber S. Regresión logística en estudios epidemiológicos de casos y controles. Bogotá; 1992.

43. Orwa J, Oduor P, Okelloh D, et al. Comparison of logistic regression with regularized machine learning methods for the prediction of tuberculosis disease in people living with HIV: cross-sectional hospital-based study in Kisumu County, Kenya. Res Sq [Preprint]. 2023 Sep 26. doi:10.21203/rs.3.rs-3354948/v1.

44. Tervi A, Junna N, Broberg M, et al. Large registry-based analysis of genetic predisposition to tu-berculosis identifies genetic risk factors at HLA. Hum Mol Genet. 2023;32(1):161-171. doi:10.1093/hmg/ddac212.

45. Phat NK, Lee Y, Vu DH, et al. Risk factors for tuberculosis treatment outcomes: a statistical lear-ning-based exploration using the SINAN database with incomplete observations. BMC Med Inform Decis Mak. 2025;25(1):301. doi:10.1186/s12911-025-03139-9.

46. Lino Ferreira da Silva Barros MHL, Alves GO, Morais Florêncio Souza L, et al. Benchmarking machine learning models to assist in the prognosis of tuberculosis. Informatics. 2021;8(2):27. doi:10.3390/informatics8020027.

47. Asad M, Mahmood A, Usman M. A machine learning-based framework for predicting treatment failure in tuberculosis: a case study of six countries. Tuberculosis (Edinb). 2020;123:101944. doi:10.1016/j.tube.2020.101944.

48. Sekandi JN, Shi W, Zhu R, et al. Application of artificial intelligence to the monitoring of medica-tion adherence for tuberculosis treatment in Africa: algorithm development and validation. JMIR AI. 2023;2(1):e40167. doi:10.2196/40167.

49. Umeta AK, Yermosa SF, Dufera AG. Bayesian parametric modeling of time to tuberculosis co-infection of HIV/AIDS patients at Jimma Medical Center, Ethiopia. Sci Rep. 2022;12(1):16475. doi:10.1038/s41598-022-20872-7.

50. Londoño Ruíz AM, Castaño Quintero AE. Regresión logística [Internet]. 2025 [cited 2025 Oct 27]. Available from: https://es.scribd.com/document/485532978/practica-de-regresion-logistica-abandono-TB-2

51. Orjuela-Cañón AD, Jutinico AL, Awad C, et al. Machine learning in the loop for tuberculosis diagnosis support. Front Public Health. 2022;10:876949. doi:10.3389/fpubh.2022.876949.

52. Hossain MdS, Khandocar MdP, Riti FA, et al. A comprehensive machine learning for high throughput tuberculosis sequence analysis, functional annotation, and visualization. Sci Rep. 2025;15(1):25866. doi:10.1038/s41598-025-98654-0.

Published

2026-02-12

How to Cite

Armijos-Hernández, A., & Sánchez -Pozo, N. N. (2026). AI-Driven Prediction of Tuberculosis Treatment Failure in Subtropical Ecuador: A Retrospective Cohort Study Comparing Logistic Regression, Random Forest, and Artificial Neural Networks. BioNatura Journal: Ibero-American Journal of Biotechnology and Life Sciences, 3(1). https://doi.org/10.70099/BJ/2026.03.01.2

Issue

Section

Research Articles

Categories