A Review of Feature Selection Methods on Diabetes Mellitus Classification
How to cite (IJASEIT) :
W. Animaw and Y. Seyoum, “Increasing prevalence of diabetes mellitus in a developing country and its related factors,” PLoS One, vol. 12, no. 11, pp. 1–11, 2017, doi:10.1371/journal.pone.0187670.
A. B. Olokoba, O. A. Obateru, and L. B. Olokoba, “Type 2 diabetes: A review of current trends,” J. Clin. Med., vol. 7, no. 18, pp. 61–66, 2015, doi: 10.5001/omj.2012.68.
A. P. Lovic, A. Piperidou, I. Zografou, and H. Grassos, “The growing epidemic of diabetes mellitus,” Curr. Vasc. Pharmacol., vol. 18, no. 2, 2020, doi: 10.2174/1570161117666190405165911.
The Lancet Diabetes & Endocrinology, “Undiagnosed type 2 diabetes: An invisible risk factor,” Lancet Diabetes Endocrinol., vol. 12, no. 4, p. 215, 2024, doi: 10.1016/S2213-8587(24)00072-X.
J. A. da Silva et al., “Diagnosis of diabetes mellitus and living with a chronic condition: Participatory study,” BMC Public Health, vol. 18, no. 699, pp. 1–8, 2018, doi: 10.1186/s12889-018-5637-9.
D. Tomic, J. E. Shaw, and D. J. Magliano, “The burden and risks of emerging complications of diabetes mellitus,” Nat. Rev. Endocrinol., vol. 18, no. 9, pp. 525–539, 2022, doi: 10.1038/s41574-022-00690-7.
N. F. Idris et al., “Stacking with recursive feature elimination-isolation forest for classification of diabetes mellitus,” PLoS One, vol. 19, no. 5, pp. 1–18, 2024, doi: 10.1371/journal.pone.0302595.
K. Devasena and J. Shana, “Building machine learning model for predicting breast cancer using different regression techniques,” IOP Conf. Ser.: Mater. Sci. Eng., vol. 1166, no. 1, Art. no. 012029, Jul. 2021, doi: 10.1088/1757-899X/1166/1/012029.
S. Jebapriya, S. David, J. W. Kathrine, and N. Sundar, “Support vector machine for classification of autism spectrum disorder based on abnormal structure of corpus callosum,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 9, pp. 489–493, 2019, doi:10.14569/ijacsa.2019.0100965.
D. Lavanya and K. U. Rani, “Performance evaluation of decision tree classifiers on medical datasets,” Int. J. Comput. Appl., vol. 26, no. 4, pp. 1–4, 2011, doi: 10.5120/3095-4247.
V. O. Khilwani et al., “Diabetes prediction using stacking classifier,” in Proc. 2021 1st IEEE Int. Conf. Artif. Intell. Mach. Vis. (AIMV), 2021, pp. 1–6, doi: 10.1109/AIMV53313.2021.9670920.
X. Li, M. Curiger, R. Dornberger, and T. Hanne, “Optimized computational diabetes prediction with feature selection algorithms,” ACM Int. Conf. Proc. Ser., no. ML, pp. 36–43, 2023, doi:10.1145/3596947.3596948.
Md. Maniruzzaman et al., “Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm,” Comput. Methods Programs Biomed., vol. 152, pp. 23–34, Dec. 2017, doi:10.1016/j.cmpb.2017.09.004.
B. F. Darst, K. C. Malecki, and C. D. Engelman, “Using recursive feature elimination in random forest to account for correlated variables in high dimensional data,” BMC Genet., vol. 19, no. Suppl. 1, pp. 1–6, 2018, doi: 10.1186/s12863-018-0633-8.
L. J. Cai, S. Lv, and K. B. Shi, “Application of an improved CHI feature selection algorithm,” Discret. Dyn. Nat. Soc., vol. 2021, 2021, doi: 10.1155/2021/9963382.
M. A. M. Hasan, M. Nasser, S. Ahmad, and K. I. Molla, “Feature selection for intrusion detection using random forest,” J. Inf. Secur., vol. 7, no. 3, pp. 129–140, 2016, doi: 10.4236/jis.2016.73009.
H. M. Farghaly and T. Abd El-Hafeez, “A high-quality feature selection method based on frequent and correlated items for text classification,” Soft Comput., vol. 27, no. 16, pp. 11259–11274, 2023, doi: 10.1007/s00500-023-08587-x.
M. E. Cintra and H. A. Camargo, “Feature subset selection for fuzzy classification methods,” in Inf. Process. Manag. Uncertain. Knowl.-Based Syst., vol. 80, pt. 1, pp. 318–327, 2010, doi: 10.1007/978-3-642-14055-6_33.
M. R. Mahmood, “Two feature selection methods comparison Chi-square and Relief-F for facial expression recognition,” in J. Phys.: Conf. Ser., vol. 1804, no. 1, Art. no. 012056, 2021, doi: 10.1088/1742-6596/1804/1/012056.
H. Habehh and S. Gohel, “Machine learning in healthcare,” Curr. Genomics, vol. 22, no. 4, pp. 291–300, 2021, doi:10.2174/1389202922666210705124359.
M. Phongying and S. Hiriote, “Diabetes classification using machine learning techniques,” Computation, vol. 11, no. 5, 2023, doi:10.3390/computation11050096.
L. Yu and H. Liu, “Efficient feature selection via analysis of relevance and redundancy,” J. Mach. Learn. Res., vol. 5, pp. 1205–1224, 2004, doi: 10.5555/1005332.1044700.
N. Pudjihartono, T. Fadason, A. W. Kempa-Liehr, and J. M. O’Sullivan, “A review of feature selection methods for machine learning-based disease risk prediction,” Front. Bioinform., vol. 2, no. June, pp. 1–17, 2022, doi: 10.3389/fbinf.2022.927312.
N. M. Abdelwahed, G. S. El-Tawel, and M. A. Makhlouf, “Effective hybrid feature selection using different bootstrap enhances cancers classification performance,” BioData Min., vol. 15, no. 1, pp. 1–54, 2022, doi: 10.1186/s13040-022-00304-y.
Y. Chen and Y. Zhong, “Improved filter method for feature selection,” IOP Conf. Ser.: Mater. Sci. Eng., vol. 569, no. 5, Art. no. 052008, Aug. 2019, doi: 10.1088/1757-899X/569/5/052008.
S. E. Awan, M. Bennamoun, F. Sohel, F. M. Sanfilippo, B. J. Chow, and G. Dwivedi, “Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death,” PLoS One, vol. 14, no. 6, pp. 1–13, 2018, doi: 10.1371/journal.pone.0218760.
S. Xia and Y. Yang, “A model-free feature selection technique of feature screening and random forest-based recursive feature elimination,” Int. J. Intell. Syst., vol. 2023, 2023, doi:10.1155/2023/2400194.
E. Sreehari and L. D. D. Babu, “Critical factor analysis for prediction of diabetes mellitus using an inclusive feature selection strategy,” Appl. Artif. Intell., vol. 38, no. 1, 2024, doi:10.1080/08839514.2024.2331919.
M. Y. Shams, Z. Tarek, and A. M. Elshewey, “A novel RFE-GRU model for diabetes classification using PIMA Indian dataset,” Sci. Rep., vol. 15, no. 1, pp. 1–22, 2025, doi: 10.1038/s41598-024-82420-9.
R. K. Sachdeva, P. Bathla, P. Rani, V. Kukreja, and R. Ahuja, “A systematic method for breast cancer classification using RFE feature selection,” in Proc. 2022 2nd Int. Conf. Adv. Comput. Innov. Technol. Eng. (ICACITE), 2022, pp. 1673–1676, doi:10.1109/ICACITE53722.2022.9823464.
S. Raghavendra and S. K. J, “Performance evaluation of random forest with feature selection methods in prediction of diabetes,” Int. J. Electr. Comput. Eng., vol. 10, no. 1, pp. 353–359, 2020, doi:10.11591/ijece.v10i1.pp353-359.
A. A. Alhussan et al., “Classification of diabetes using feature selection and hybrid Al-Biruni Earth Radius and Dipper Throated optimization,” Diagnostics, vol. 13, no. 12, pp. 1–40, 2023, doi:10.3390/diagnostics13122038.
R. Natras, B. Soja, and M. Schmidt, “Ensemble machine learning of random forest, AdaBoost and XGBoost for vertical total electron content forecasting,” Remote Sens., vol. 14, no. 15, pp. 1–34, Aug. 2022, doi: 10.3390/rs14153547.
S. Ramya, T. Vijayaraghavan, and D. Kalaivani, “Diabetic prediction using feature selection-based random forest and fine-tuned K-nearest neighbor classifier algorithm—A design thinking approach,” in Proc. 2023 4th Int. Conf. Electron. Sustain. Commun. Syst. (ICESC), 2023, pp. 1303–1309, doi: 10.1109/ICESC57686.2023.10193333.
S. Lin, W. Ji, and J. Pei, “A method for selecting diabetes features based on random forest,” J. Phys.: Conf. Ser., vol. 1237, no. 2, Art. no. 022123, 2019, doi: 10.1088/1742-6596/1237/2/022123.
S. Gündoğdu, “Efficient prediction of early-stage diabetes using XGBoost classifier with random forest feature selection technique,” Multimed. Tools Appl., vol. 82, no. 22, pp. 34163–34181, 2023, doi:10.1007/s11042-023-15165-8.
P. Rajendra and S. Latifi, “Prediction of diabetes using logistic regression and ensemble techniques,” Comput. Methods Programs Biomed. Updat., vol. 1, p. 100032, 2021, doi:10.1016/j.cmpbup.2021.100032.
I. S. Thaseen and C. A. Kumar, “Intrusion detection model using fusion of chi-square feature selection and multi-class SVM,” J. King Saud Univ. - Comput. Inf. Sci., vol. 29, no. 4, pp. 462–472, 2017, doi:10.1016/j.jksuci.2015.12.004.
V. Rupapara, F. Rustam, A. Ishaq, E. Lee, and I. Ashraf, “Chi-square and PCA-based feature selection for diabetes detection with ensemble classifier,” Intell. Autom. Soft Comput., vol. 36, no. 2, pp. 1931–1949, 2023, doi: 10.32604/iasc.2023.028257.
A. S. Jaddoa and Z. T. M. Al-Ta’i, “Diagnosis of diabetes mellitus using (chi square-information gain) selectors and (SVM and KNN) classifiers,” in Proc. 1st Int. & 4th Local Conf. Pure Sci. (ICPS), 2023, doi: 10.1063/5.0102761.
L. A. S. Cardona, H. D. Vargas-Cardona, P. N. González, D. A. C. Peña, and Á. Á. O. Gutiérrez, “Classification of categorical data based on the chi-square dissimilarity and t-SNE,” Computation, vol. 8, no. 4, pp. 1–15, 2020, doi: 10.3390/computation8040104.
A. B. Pillay, D. Pathmanathan, A. Abu, and H. Omar, “RFE-based feature selection to improve classification accuracy for morphometric analysis of craniodental characters of house rats,” Sains Malaysiana, vol. 52, no. 7, pp. 1901–1914, 2023, doi: 10.17576/jsm-2023-5207-01.
S. Srivatsan and T. Santhanam, “Early onset detection of diabetes using feature selection and boosting techniques,” ICTACT J. Soft Comput., vol. 12, no. 1, pp. 2474–2485, 2021, doi:10.21917/ijsc.2021.0344.
Alifah, T. Siswantining, D. Sarwinda, and A. Bustamam, “RFE and chi-square based feature selection approach for detection of diabetic retinopathy,” in Proc. Int. Joint Conf. Sci. Eng. (IJCSE 2020), 2020, no. Feb. 2021, doi: 10.2991/aer.k.201124.069.
G. Audemard, S. Bellart, L. Bounia, F. Koriche, J. M. Lagniez, and P. Marquis, “Trading complexity for sparsity in random forest explanations,” in Proc. 36th AAAI Conf. Artif. Intell. (AAAI), vol. 36, 2022, pp. 5461–5469, doi: 10.1609/aaai.v36i5.20484.
S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, “Feature selection using an improved Chi-square for Arabic text classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 32, no. 2, pp. 225–231, Feb. 2020, doi: 10.1016/j.jksuci.2018.05.010.
W. H. Nugroho, S. Handoyo, Y. J. Akri, and A. D. Sulistyono, “Building multiclass classification model of logistic regression and decision tree using the Chi-square test for variable selection method,” J. Hunan Univ. Nat. Sci., vol. 49, no. 4, pp. 172–181, 2022, doi:10.55463/issn.1674-2974.49.4.17.
Vikas and P. Kaur, “Lung cancer detection using Chi-square feature selection and support vector machine algorithm,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 10, no. 3, pp. 2050–2060, Jun. 2021, doi:10.30534/ijatcse/2021/801032021.
M. L. Mchugh, “The Chi-square test of independence,” Biochem. Medica, vol. 23, no. 2, pp. 143–149, 2013, doi:10.11613/BM.2013.018.
W. Li et al., “Predictive model and risk analysis for diabetic retinopathy using machine learning: A retrospective cohort study in China,” BMJ Open, vol. 11, no. 11, pp. 1–11, 2021, doi:10.1136/bmjopen-2021-050989.
S. Matharaarachchi, M. Domaratzki, and S. Muthukumarana, “Assessing feature selection method performance with class imbalance data,” Mach. Learn. Appl., vol. 6, p. 100170, 2021, doi:10.1016/j.mlwa.2021.100170.
M. H. Rizky, M. R. Faisal, I. Budiman, D. Kartini, and F. Abadi, “Effect of hyperparameter tuning using random search on tree-based classification algorithm for software defect prediction,” IJCCS (Indones. J. Comput. Cybern. Syst.), vol. 18, no. 1, pp. 95–104, 2024, doi: 10.22146/ijccs.90437.
B. H. Shekar and G. Dagnew, “Grid search-based hyperparameter tuning and classification of microarray cancer data,” in Proc. 2nd Int. Conf. Adv. Comput. Commun. Paradigms (ICACCP), 2019, doi:10.1109/icaccp.2019.8882943.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).