Data Analysis and Explainable Machine Learning for Stunting Prediction
Keywords:
Stunting, Machine Learning, SHAP, Explainable AI, XGBoostAbstract
Childhood stunting remains a critical global health concern, reflecting chronic malnutrition that affects both physical growth and long-term cognitive development. Despite ongoing interventions, early detection in many low- and middle-income countries is still hindered by limited resources and the absence of interpretable decision-support tools. This study aims to develop and evaluate an explainable machine learning framework to predict stunting among toddlers using simple anthropometric and demographic data, thereby supporting evidence-based public health interventions. Data were collected from 40,071 children aged 0–59 months in Jeneponto Regency, South Sulawesi, Indonesia, covering the period 2021–2024. Key features included age in months, gender, weight, and height, while stunting status served as the target variable. Several machine learning algorithms were implemented, including Logistic Regression, Support Vector Machine, Multilayer Perceptron, K-Nearest Neighbors, Decision Tree, Random Forest, XGBoost, and Convolutional Neural Network. Data preprocessing involved imputation of missing values, feature encoding, and an 80/20 train-test split, while model interpretability was achieved using SHAP (SHapley Additive exPlanations) to provide both global and local feature attributions. The experimental results show that XGBoost achieved the highest accuracy of 97.57%, followed closely by Random Forest (97.28%) and Decision Tree (96.62%). SHAP analysis revealed that height was the most influential predictor, followed by age, gender, and weight, providing actionable insights for early identification of at-risk children. Local SHAP force plots further enabled case-level interpretation, enhancing the trustworthiness of the model in clinical or community health applications. The novelty of this research lies in integrating high-performing machine learning models with explainable AI for stunting prediction using minimal, easily collected health features in a resource-limited context. This framework not only improves the accuracy and transparency of early stunting detection but also provides a scalable approach to strengthen nutrition surveillance systems, with potential to inform targeted interventions and reduce the long-term impacts of childhood malnutrition.
References
[1] Islam MZ, Chowdhury MRK, Kader M, Billah B, Islam MS, Rashid M. Determinants of low birth weight and its effect on childhood health and nutritional outcomes in Bangladesh. Vol. 43, Journal of health, population, and nutrition. 2024. p. 64.
[2] Ramlan, P., Sukri, P., Abdullah, M. T., Ibrahim, M. A., & Cahyani, A. (2025, March). Poverty and stunting: A socioeconomic analysis of vulnerable communities; a systematic literature review. In IOP Conference Series: Earth and Environmental Science (Vol. 1475, No. 1, p. 012026). IOP Publishing.
[3] Jaya, P. H. I., Izudin, A., Aditya, R., & Saptoni, S. (2025). Exploring local experiences in reducing childhood stunting in Indonesia: towards an agenda of welfare provision. Asia Pacific Journal of Social Work and Development, 1-24.
[4] Muliadi, T., Ahmad, A., Nur, A., Marissa, N., Marisa, Junaidi, ... & Annisa, D. (2025). The coverage of indicators of sensitive and specific intervention programs and prevalence of stunting under-five children: A cross-sectional study in Aceh Province, Indonesia. Nutrition and Health, 31(1), 165-173.
[5] Schneider, E. B. (2025). The determinants of child stunting and shifts in the growth pattern of children: A long‐run, global review. Journal of Economic Surveys, 39(2), 405-452.
[6] Mulyani, A. T., Khairinisa, M. A., Khatib, A., & Chaerunisaa, A. Y. (2025). Understanding Stunting: Impact, Causes, and Strategy to Accelerate Stunting Reduction—A Narrative Review. Nutrients, 17(9), 1493.
[7] Dessie, G., Li, J., Nghiem, S., & Doan, T. (2025). Child stunting, thinness, and their academic performance in Ethiopia: A longitudinal study. Social Science & Medicine, 373, 118050.
[8] Surya, N., & Someshwar, H. P. (2025). Low-Cost telerehabilitation in low-and middle-income countries (LMICs): Overcoming barriers to access and improving healthcare delivery. NeuroRehabilitation, 10538135241303349.
[9] Zhang, S., Wu, L., Zhao, Z., Massó, J. F., & Chen, M. (2025). Artificial Intelligence in Gerontology: Data-Driven Health Management and Precision Medicine. Advances in Gerontology, 1-14.
[10] Recharla, M., Chakilam, C., Kannan, S., Nuka, S. T., & Suura, S. R. (2025). Harnessing AI and Machine Learning for Precision Medicine: Advancements in Genomic Research, Disease Detection, and Personalized Healthcare. American Journal of Psychiatric Rehabilitation, 28(1), 112-123.
[11] Novalina, N., Tarigan, I. A. A., Kameela, F. K., & Rizkinia, M. (2025). Benchmarking machine learning algorithm for stunting risk prediction in Indonesia. Bulletin of Electrical Engineering and Informatics, 14(3), 2252–2263. https://doi.org/10.11591/eei.v14i3.8997
[12] Byna, A., Anisa, F. N., & Nurhaeni, N. (2025). Improving stunting prediction in children: Evaluating ensemble algorithms with SMOTE and feature selection. AIP Conference Proceedings, 3250(1), 020003. https://doi.org/10.1063/5.0240617
[13] Alam, M. M., Khan, A. I., Zafar, A., Sohail, M., Ahmad, M. T., & Azim, R. (2025). Advancing nutritional status classification with hybrid artificial intelligence: A novel methodological approach. Brain and Behavior, 15(5), e70548. https://doi.org/10.1002/brb3.70548
[14] Arya, P. K., Sur, K., Kundu, T., Dhote, S., & Singh, S. K. (2025). Unveiling predictive factors for household-level stunting in India: A machine learning approach using NFHS-5 and satellite-driven data. Nutrition, 132, 112674. https://doi.org/10.1016/j.nut.2024.112674
[15] Setiawan, Y., Al Faroby, M. H. Z., Ma’ady, M. N. P., Sanjaya, I. M. W. A., & Ramadhani, C. V. C. (2025). Modality-based modeling with data balancing and dimensionality reduction for early stunting detection. Jurnal Online Informatika, 10(1), 53–65. https://doi.org/10.15575/join.v10i1.1495
[16] RS, Asmaul Husna; Lonang, Syahrani; Putra, Ahmad Fatoni Dwi (2025), “Dataset Stunting and Nutritional Status of Toddler from Jeneponto Regency, South Sulawesi, Indonesia”, Mendeley Data, V2, doi: 10.17632/wzwpc9j5bx.2
[17] Zhang, M. L., & Zhou, Z. H. (2013). A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8), 1819-1837.
[18] 23. Harrell, Jr, F. E., & Harrell, F. E. (2015). Binary logistic regression. Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis, 219-274.
[19] Awad, M., & Khanna, R. (2015). Support vector machines for classification. In Efficient learning machines: Theories, concepts, and applications for engineers and system designers (pp. 39-66). Berkeley, CA: Apress.
[20] Kruse, R., Mostaghim, S., Borgelt, C., Braune, C., & Steinbrecher, M. (2022). Multi-layer perceptrons. In Computational intelligence: a methodological introduction (pp. 53-124). Cham: Springer International Publishing.
[21] Farhadi, Z., Bevrani, H., Feizi-Derakhshi, M. R., Kim, W., & Ijaz, M. F. (2022). An ensemble framework to improve the accuracy of prediction using clustered random-forest and shrinkage methods. Applied Sciences, 12(20), 10608.
[22] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[23] Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review, 53, 5455-5516.
[24] Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient kNN classification with different numbers of nearest neighbors. IEEE transactions on neural networks and learning systems, 29(5), 1774-1785.
[25] De Ville, B. (2013). Decision trees. Wiley Interdisciplinary Reviews: Computational Statistics, 5(6), 448-455.
[26] Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2022). Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine, 214, 106584.
[27] S. Suyahman, S. Sunardi, M. Murinto, and A. N. Khusna, “Data Augmentation Using Test-Time Augmentation on Convolutional Neural Network-Based Brand Logo Trademark Detection,” Indonesian Journal of Artificial Intelligence and Data Mining, vol. 7, no. 2, pp. 266–274, 2024.
[28] S. Sunardi and S. Suyahman, “Analisis Komparasi Prediksi Serangan DDoS Menggunakan Machine Learning,” in Proceeding of Informatics Collaborations and Dissemination Meeting, vol. 1, no. 1, pp. 84–91, May 2025.
[29] S. Suyahman, S. Sunardi, and M. Murinto, “Comparative Analysis of CNN Architectures in Siamese Networks with Test-Time Augmentation for Trademark Image Similarity Detection,” Scientific Journal of Informatics, vol. 11, no. 4, Nov. 2024.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Journal of Artificial Intelligence and Legal Technology

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.