Diabetes Early Prediction Using Machine Learning and Ensemble Methods

Hyung-Ho Ha; Hangun Kim; Young Hyun Yu; Hyun Sim

doi:10.18517/ijaseit.15.2.20947

DOI : https://doi.org/10.18517/ijaseit.15.2.20947

Diabetes Early Prediction Using Machine Learning and Ensemble Methods

Hyung-Ho Ha ⁽¹⁾, Hangun Kim ⁽²⁾, Young Hyun Yu ⁽³⁾, Hyun Sim ⁽⁴⁾

(1) Department Pharmacy, Sunchon National University, Republic of Korea

(2) Department Pharmacy, Sunchon National University, Republic of Korea

(3) Department Pharmacy, Sunchon National University, Republic of Korea

(4) Department Smart Agriculture, Sunchon National University, Republic of Korea

Fulltext View | Download

How to cite (IJASEIT) :

[1]

H.-H. Ha, H. Kim, Y. H. Yu, and H. Sim, “Diabetes Early Prediction Using Machine Learning and Ensemble Methods”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 15, no. 2, pp. 363–375, Apr. 2025.

Citation Format :

This study aims to develop and validate an enhanced early prediction model for diabetes utilizing machine learning and ensemble techniques, aimed at addressing the rapid increase in diabetes prevalence and the associated healthcare burden. Leveraging diverse datasets, including the Pima Indian Diabetes Dataset, electronic health records from local hospitals, and wearable device data, this research employs a variety of innovative methods. Generative Adversarial Networks (GAN) are used for data augmentation to address class imbalances, while SHAP (Shapley Additive exPlanations) provides interpretability for machine learning predictions, enhancing trust and understanding in clinical applications. The methodology integrates several machine learning algorithms—Support Vector Machine (SVM), Random Forest, XGBoost, Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks—comparing their efficacy in diabetes prediction. Ensemble methods further refine the predictive accuracy, reliability, and applicability of the models. The study evaluates these models based on standard performance metrics such as accuracy, precision, recall, and F1-score across different configurations and combined approaches. Results indicate that ensemble methods significantly enhance predictive performance, achieving higher accuracy and precision compared to individual models. Particularly, the integration of deep learning techniques with traditional machine learning models provides substantial improvements in detecting early signs of Type 1 and Type 2 diabetes, utilizing insights from insulin and C-peptide data. The application of XAI techniques like SHAP not only clarifies model decisions but also assists in tailoring interventions and management strategies in clinical setting.

I. A. Islam and M. I. Milon, “Diabetes Prediction: A Deep Learning Approach,” International Journal of Information Engineering and Electronic Business, vol. 11, pp. 21–27, 2019, doi:10.5815/ijieeb.2019.02.03.

J. P. Kandhasamy and S. Balamurali, “Performance Analysis of Classifier Models to Predict Diabetes Mellitus,” Procedia Computer Science, vol. 47, pp. 45–51, 2019, doi: 10.1016/j.procs.2015.03.182.

M. Radja and A. W. R. Emanuel, “Performance Evaluation of Supervised Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction,” in Proc. 5th Int. Conf. Science in Information Technology (ICSITech), 2019, doi:10.1109/icsitech46713.2019.8987479.

B. G. Choi, S. W. Rha, S. W. Kim, J. H. Kang, J. Y. Park, Y. K. Noh, and S. I. Choi, “Machine Learning for the Prediction of New-Onset Diabetes Mellitus During 5-Year Follow-up in Nondiabetic Patients with Cardiovascular Risks,” Yonsei Medical Journal, vol. 60, no. 2, pp. 191–199, 2019, doi: 10.3349/ymj.2019.60.2.191.

R. Akula, N. Nguyen, and I. Garibay, “Supervised Machine Learning-Based Ensemble Model for Accurate Prediction of Type 2 Diabetes,” in Proc. SoutheastCon, 2019, doi:10.1109/southeastcon42311.2019.9020358.

Z. Xie, O. Nikolayeva, J. Luo, and D. Li, “Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques,” Preventing Chronic Disease, vol. 16, p. E130, 2019, doi:10.5888/pcd16.190109.

H. Lai, H. Huang, K. Keshavjee, A. Guergachi, and X. Gao, “Predictive Models for Diabetes Mellitus Using Machine Learning Techniques,” BMC Endocrine Disorders, vol. 19, no. 1, p. 101, 2019, doi: 10.1186/s12902-019-0436-6.

H. Abbas, L. Alic, M. Erraguntla, J. Ji, M. AbdulGhani, Q. Abbasi, and M. Qaraqe, “Predicting Long-Term Type 2 Diabetes with Support Vector Machine Using Oral Glucose Tolerance Test,” PLoS ONE, vol. 14, no. 7, p. e0219636, 2019, doi: 10.1371/journal.pone.0219636.

B. Farran, R. Al-Wotayan, H. Alkandari, D. Al-Abdulrazzaq, A. Channanath, and T. A. Thanaraj, “Use of Non-Invasive Parameters and Machine-Learning Algorithms for Predicting Future Risk of Type 2 Diabetes: A Retrospective Cohort Study of Health Data from Kuwait,” Frontiers in Endocrinology, vol. 10, p. 624, 2019, doi:10.3389/fendo.2019.00624.

X. L. Xiong, R. X. Zhang, Y. Bi, W. H. Zhou, Y. Yu, and D. L. Zhu, “Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-Sectional Retrospective Study in Chinese Adults,” Current Medical Science, vol. 39, no. 4, pp. 582–588, 2019, doi:10.1007/s11596-019-2077-4.

A. Dinh, S. Miertschin, A. Young, and S. D. Mohanty, “A Data-Driven Approach to Predicting Diabetes and Cardiovascular Disease with Machine Learning,” BMC Medical Informatics and Decision Making, vol. 19, no. 1, p. 211, 2019, doi: 10.1186/s12911-019-0918-5.

Y. Liu, S. Ye, X. Xiao, C. Sun, G. Wang, G. Wang, and B. Zhang, “Machine Learning for Tuning, Selection, and Ensemble of Multiple Risk Scores for Predicting Type 2 Diabetes,” Risk Management and Healthcare Policy, vol. 12, pp. 189–198, 2019, doi:10.2147/rmhp.s225762.

K. Leerojanaprapa and K. Sirikasemsuk, “Comparison of Bayesian Networks for Diabetes Prediction,” in International Conference on Computer, Communication and Computational Sciences (IC4S), Bangkok, Thailand, 2018, Advances in Intelligent Systems and Computing, vol. 924, pp. 425–434, doi: 10.1007/978-981-13-6861-5_37.

N. Sneha and T. Gangil, “Analysis of Diabetes Mellitus for Early Prediction Using Optimal Features Selection,” Journal of Big Data, vol. 6, no. 1, p. 13, 2019, doi: 10.1186/s40537-019-0175-6.

H. Naz and S. Ahuja, “Deep Learning Approach for Diabetes Prediction Using PIMA Indian Dataset,” Journal of Diabetes & Metabolic Disorders, vol. 19, pp. 391–403, 2020, doi:10.1007/s40200-020-00520-5.

H. Zhou, R. Myrzashova, and R. Zheng, “Diabetes Prediction Model Based on an Enhanced Deep Neural Network,” EURASIP Journal on Wireless Communications and Networking, vol. 2020, no. 1, p. 148, 2020, doi: 10.1186/s13638-020-01765-7.

M. Seera and C. P. Lim, “A Hybrid Intelligent System for Medical Data Classification,” Expert Systems with Applications, vol. 41, no. 5, pp. 2239–2249, 2020, doi: 10.1016/j.eswa.2013.09.022.

I. Sarker, M. Faruque, H. Alqahtani, and A. Kalim, “k-Nearest Neighbor Learning Based Diabetes Mellitus Prediction and Analysis for e-Healthcare Services,” EAI Endorsed Transactions on Scalable Information Systems, 2020, doi: 10.4108/eai.13-7-2018.162737.

A. Cahn, A. Shoshan, T. Sagiv, R. Yesharim, R. Goshen, V. Shalev, and I. Raz, “Prediction of Progression from Prediabetes to Diabetes: Development and Validation of a Machine Learning Model,” Diabetes/Metabolism Research and Reviews, vol. 36, no. 2, p. e3252, 2020, doi: 10.1002/dmrr.3252.

R. García-Carretero, L. Vigil-Medina, I. Mora-Jiménez, C. Soguero-Ruiz, O. Barquero-Pérez, and J. Ramos-López, “Use of a k-Nearest Neighbors Model to Predict the Development of Type 2 Diabetes Within 2 Years in an Obese, Hypertensive Population,” Medical & Biological Engineering & Computing, vol. 58, no. 5, pp. 991–1002, 2020, doi: 10.1007/s11517-020-02132-w.

L. Zhang, Y. Wang, M. Niu, C. Wang, and Z. Wang, “Machine Learning for Characterizing Risk of Type 2 Diabetes Mellitus in a Rural Chinese Population: The Henan Rural Cohort Study,” Scientific Reports, vol. 10, p. 4406, 2020, doi: 10.1038/s41598-020-61123-x.

A. U. Haq, J. P. Li, J. Khan, M. H. Memon, S. Nazir, S. Ahmad, G. A. Khan, and A. Ali, “Intelligent Machine Learning Approach for Effective Recognition of Diabetes in e-Healthcare Using Clinical Data,” Sensors, vol. 20, no. 9, p. 2649, 2020, doi: 10.3390/s20092649.

T. Yang, L. Zhang, L. Yi, H. Feng, S. Li, H. Chen, J. Zhu, J. Zhao, Y. Zeng, H. Liu, et al., “Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation,” JMIR Medical Informatics, vol. 8, no. 6, p. e15431, 2020, doi: 10.2196/15431.

H. S. Ahn, J. H. Kim, H. Jeong, J. Yu, J. Yeom, S. H. Song, S. S. Kim, I. J. Kim, and K. Kim, “Differential Urinary Proteome Analysis for Predicting Prognosis in Type 2 Diabetes Patients With and Without Renal Dysfunction,” International Journal of Molecular Sciences, vol. 21, no. 12, p. 4236, 2020, doi: 10.3390/ijms21124236.

Y. Tang, R. Gao, H. H. Lee, Q. S. Wells, A. Spann, J. G. Terry, J. J. Carr, Y. Huo, S. Bao, B. A. Landman, et al., “Prediction of Type II Diabetes Onset With Computed Tomography and Electronic Medical Records,” in Multimodal Learning for Clinical Decision Support and Clinical Image-Based Procedures, Springer, 2020, pp. 13–23, doi:10.1007/978-3-030-60946-7_2.

M. Maniruzzaman, M. J. Rahman, B. Ahammed, and M. M. Abedin, “Classification and Prediction of Diabetes Disease Using Machine Learning Paradigm,” Health Information Science and Systems, vol. 8, no. 1, p. 7, 2020, doi: 10.1007/s13755-019-0095-z.

S. Jain, “A Supervised Model for Diabetes Divination,” Biosc Biotech Res Comm, vol. 13, no. 14, pp. 315–318, 2020, doi:10.21786/bbrc/13.14/7.

P. B. K. Chowdary and R. U. Kumar, “An Effective Approach for Detecting Diabetes Using Deep Learning Techniques Based on Convolutional LSTM Networks,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 8, pp. 519–525, 2021, doi: 10.14569/ijacsa.2021.0120466.

J. A. Mat Jizat, R. A. Rahim, S. Harun, M. S. Khan, M. S. M. Rizam, and N. M. Saad, “Evaluation of the Machine Learning Classifier in Wafer Defects Classification,” ICT Express, vol. 7, no. 4, pp. 535–539, 2021, doi: 10.1016/j.icte.2021.04.007.

J. Liu, L. Fan, Q. Jia, L. Wen, and C. Shi, “Early Diabetes Prediction Based on Stacking Ensemble Learning Model,” in Proc. 33rd Chinese Control and Decision Conference (CCDC), 2021, pp. 5722–5727, doi:10.1109/CCDC52312.2021.9601932.

L. Fregoso-Aparicio, J. Noguez, L. Montesinos, and J. A. García-García, “Machine Learning and Deep Learning Predictive Models for Type 2 Diabetes: A Review,” Diabetology & Metabolic Syndrome, vol. 13, p. 767, 2021, doi: 10.1186/s13098-021-00767-9.

A. C. Lyngdoh, N. A. Choudhury, and S. Moulik, “Diabetes Disease Prediction Using Machine Learning Algorithm,” in Proc. IEEE-EMBS Conf. Biomedical Engineering and Sciences (IECBES), 2021, pp. 935–940, doi: 10.1109/IECBES48179.2021.9398759.

A. Tack, B. Preim, and S. Zachow, “Fully Automated Assessment of Knee Alignment from Full-Leg X-Rays Employing a “YOLOv4 and ResNet Landmark Regression Algorithm” (YARLA): Data from the Osteoarthritis Initiative,” Computer Methods and Programs in Biomedicine, vol. 203, p. 106080, 2021, doi:10.1016/j.cmpb.2021.106080.

J. J. Boutilier, T. C. Y. Chan, M. Ranjan, and S. Deo, “Risk Stratification for Early Detection of Diabetes and Hypertension in Resource-Limited Settings: Machine Learning Analysis,” Journal of Medical Internet Research, vol. 23, no. 1, p. e20123, 2021, doi:10.2196/20123.

J. Li, Q. Chen, X. Hu, P. Yuan, L. Cui, T. Jiang, and J. Ma, “Establishment of Noninvasive Diabetes Risk Prediction Model Based on Tongue Features and Machine Learning Techniques,” International Journal of Medical Informatics, vol. 149, p. 104429, 2021, doi:10.1016/j.ijmedinf.2021.104429.

H. B. Kibria, M. Nahiduzzaman, M. O. F. Goni, M. Ahsan, and J. Haider, “An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI,” Sensors, vol. 22, no. 19, p. 7268, 2022, doi: 10.3390/s22197268.

A. Dutta, M. K. Hasan, M. Ahmad, M. A. Awal, M. A. Islam, M. Masud, and H. Meshref, “Early Prediction of Diabetes Using an Ensemble of Machine Learning Models,” International Journal of Environmental Research and Public Health, vol. 19, no. 19, p. 12378, 2022, doi: 10.3390/ijerph191912378.

H. Wei, J. Sun, W. Shan, W. Xiao, B. Wang, M. Hu, X. Wang, and Y. Xia, “Environmental Chemical Exposure Dynamics and Machine Learning Based Prediction of Diabetes Mellitus,” Science of the Total Environment, vol. 806, p. 150674, 2022, doi:10.1016/j.scitotenv.2021.150674.

S. M. Ganie, P. K. D. Pramanik, M. B. Malik, S. Mallik, and H. Qin, “An Ensemble Learning Approach for Diabetes Prediction Using Boosting Techniques,” Frontiers in Genetics, 2023, doi:10.3389/fgene.2023.1252159.

M. F. Aslan and K. Sabanci, “A Novel Proposal for Deep Learning-Based Diabetes Prediction,” Diagnostics, vol. 13, no. 4, p. 796, 2023, doi: 10.3390/diagnostics13040796.

K. Abnoosian, R. Farnoosh, and M. H. Behzadi, “Prediction of Diabetes Disease Using an Ensemble of Machine Learning Multi Classifier Models,” BMC Bioinformatics, vol. 24, p. 5465, 2023, doi:10.1186/s12859-023-05465-z.

A. E.-S. El-Bashbishy and H. M. El-Bakry, “Pediatric Diabetes Prediction Using Deep Learning,” Scientific Reports, 2024, doi:10.1038/s41598-024-51438-4.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution LicenseÂ that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).