Impact of Feature Selection and Data Augmentation for Pregnancy Risk Detection in Indonesia

Setio Basuki (1), Muhammad Irfan (2), Yufis Azhar (3)
(1) Informatics Engineering, Universitas Muhammadiyah Malang, Indonesia
(2) Informatics Engineering, Universitas Muhammadiyah Malang, Indonesia
(3) Informatics Engineering, Universitas Muhammadiyah Malang, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Basuki, Setio, et al. “Impact of Feature Selection and Data Augmentation for Pregnancy Risk Detection in Indonesia”. International Journal on Advanced Science, Engineering and Information Technology, vol. 12, no. 6, Dec. 2022, pp. 2266-73, doi:10.18517/ijaseit.12.6.16145.
This paper aims to develop an automatic system for pregnancy risk detection in Indonesia. The system requires a sophisticated approach to achieve the required performance as a sensitive field. Existing works are developed using small-sized datasets and limited classification features. Moreover, all features treated equally make the detection results hard to interpret which features contribute more. To address these issues, we propose to combine more complex features, data augmentation methods, and feature selection techniques. We prefer to use all 118 pregnancy indicators and 400 instances from Puskesmas as an original dataset. Next, the new datasets are used to build two data augmentation methods, i.e., GMM and CTGAN. Each data augmentation method generates 2,000 new synthetic instances. Following this, five machine learning methods combined with three feature selection approaches, i.e., RFE, Random Forest, and Chi-Square, are implemented in all datasets. Through experiments, we observed that feature selection techniques play an essential role in improving classification accuracies. While the GMM-based augmentation demonstrated performance improvement, the CTGAN-based synthetic dataset depicted low performances. The best accuracy on all experiment settings reached 95%. By using Random Forest combined with RFE on a GMM-based dataset, the highest accuracy was achieved using only five features. Another notable result is that both XGBoost and Decision Tree reached the same 95% accuracy on the GMM-based dataset on only nine features. The overall results show that appropriate data augmentation and feature selection are a matter for achieving better performance in this research.

Kementerian Kesehatan Republik Indonesia, “Profil Kesehatan Indonesia 2015,” M. K. Dr. drh. Didik Budijanto, M.Kes;Yudianto, SKM, M.Si; Boga Hardhana, S.Si, MM”¯; drg. Titi Aryati Soenardi, Ed. Jakarta: Kementerian Kesehatan Republik Indonesia, 2016, p. 403.

M. Irfan, S. Basuki, and Y. Azhar, "Giving more insight for automatic risk prediction during pregnancy with interpretable machine learning," Bull. Electr. Eng. Informatics, vol. 10, no. 3, pp. 1621-1633, 2021.

L. Davidson and M. R. Boland, "Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes," Brief. Bioinform., vol. 22, no. 5, pp. 1-29, 2021.

A. Akbulut, E. Ertugrul, and V. Topcu, "Fetal health status prediction based on maternal clinical history using machine learning techniques," Comput. Methods Programs Biomed., vol. 163, pp. 87-100, 2018.

J. M. Bautista, Q. A. I. Quiwa, and R. S. J. Reyes, "Machine learning analysis for remote prenatal care," IEEE Reg. 10 Annu. Int. Conf. Proceedings/TENCON, vol. 2020-Novem, pp. 397-402, 2020.

L. Davidson and M. R. Boland, "Enabling pregnant women and their physicians to make informed medication decisions using artificial intelligence," J. Pharmacokinet. Pharmacodyn., vol. 47, no. 4, pp. 305-318, 2020.

F. Sarhaddi, I. Azimi, S. Labbaf, H. Niela-vilí©n, and N. Dutt, "Long-Term IoT-Based Maternal Monitoring: System Design and Evaluation," MDPI Sensors, vol. 21, pp. 1-21, 2021.

M. W. L. Moreira, J. J. P. C. Rodrigues, A. M. B. Oliveira, K. Saleem, and A. J. V. Neto, "Predicting hypertensive disorders in high-risk pregnancy using the random forest approach," IEEE Int. Conf. Commun., 2017.

M. W. L. Moreira, J. J. P. C. Rodrigues, V. Furtado, C. X. Mavromoustakis, N. Kumar, and I. Woungang, "Fetal Birth Weight Estimation in High-Risk Pregnancies Through Machine Learning Techniques," IEEE Int. Conf. Commun., vol. 2019-May, pp. 1-6, 2019.

M. Tahir, T. Badriyah, and I. Syarif, "Classification Algorithms of Maternal Risk Detection For Preeclampsia With Hypertension During Pregnancy Using Particle Swarm Optimization," Emit. Int. J. Eng. Technol., vol. 6, no. 2, pp. 236-253, 2018.

R. Chu et al., "Predicting the Risk of Adverse Events in Pregnant Women With Congenital Heart Disease," J. Am. Heart Assoc., vol. 9, no. 14, p. e016371, 2020.

E. Purwanti, I. S. Preswari, and Ernawati, "Early risk detection of pre-eclampsia for pregnant women using artificial neural network," Int. J. online Biomed. Eng., vol. 15, no. 2, pp. 71-80, 2019.

H. Sufriyana, Y. W. Wu, and E. C. Y. Su, "Artificial intelligence-assisted prediction of preeclampsia: Development and external validation of a nationwide health insurance dataset of the BPJS Kesehatan in Indonesia," EBioMedicine, vol. 54, 2020.

L. Yang et al., "Predictive models of hypertensive disorders in pregnancy based on support vector machine algorithm," Technol. Heal. Care, vol. 28, no. S1, pp. S181-S186, 2020.

E. Malacova et al., "Stillbirth risk prediction using machine learning for a large cohort of births from Western Australia, 1980-2015," Sci. Rep., vol. 10, no. 1, pp. 1-8, 2020.

S. Bhadra et al., "Quantifying leaf chlorophyll concentration of sorghum from hyperspectral data using derivative calculus and machine learning," Remote Sens., vol. 12, no. 13, 2020.

P. W. Hatfield et al., "Augmenting machine learning photometric redshifts with Gaussian mixture models," Mon. Not. R. Astron. Soc., vol. 498, no. 4, pp. 5498-5510, 2020.

D. A. B. Oliveira, "Augmenting Data Using Gaussian Mixture Embedding for Improving Land Cover Segmentation," 2020 IEEE Lat. Am. GRSS ISPRS Remote Sens. Conf. LAGIRS 2020 - Proc., pp. 333-338, 2020.

A. Arora, N. Shoeibi, V. Sati, A. Gonzí¡lez-Briones, P. Chamoso, and E. Corchado, "Data augmentation using gaussian mixture model on csv files," Adv. Intell. Syst. Comput., vol. 1237 AISC, no. January, pp. 258-265, 2021.

M. Javeed, M. Gochoo, A. Jalal, and K. Kim, "Hf-sphr: Hybrid features for sustainable physical healthcare pattern recognition using deep belief networks," Sustain., vol. 13, no. 4, pp. 1-27, 2021.

H. Elmoaqet, J. Kim, D. Tilbury, S. K. Ramachandran, M. Ryalat, and C. H. Chu, "Gaussian mixture models for detecting sleep apnea events using single oronasal airflow record," Appl. Sci., vol. 10, no. 21, pp. 1-15, 2020.

A. Singhal, P. Singh, B. Lall, and S. D. Joshi, "Modeling and prediction of COVID-19 pandemic using Gaussian mixture model," Chaos, Solitons and Fractals, vol. 138, p. 110023, 2020.

H. Zhang, L. Huang, C. Q. Wu, and Z. Li, "An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset," Comput. Networks, vol. 177, no. April, 2020.

A. Saygılı, "Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier," Arab. J. Sci. Eng., vol. 47, no. 2, pp. 2435-2453, 2022.

A. Das, U. R. Acharya, S. S. Panda, and S. Sabut, "Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques," Cogn. Syst. Res., vol. 54, pp. 165-175, 2019.

K. Sekaran, P. Chandana, N. M. Krishna, and S. Kadry, "Deep learning convolutional neural network (CNN) With Gaussian mixture model for predicting pancreatic cancer," Multimed. Tools Appl., vol. 79, no. 15-16, pp. 10233-10247, 2020.

F. Riaz et al., "Gaussian Mixture Model Based Probabilistic Modeling of Images for Medical Image Segmentation," IEEE Access, vol. 8, pp. 16846-16856, 2020.

L. Moraru et al., "Gaussian mixture model for texture characterization with application to brain DTI images," J. Adv. Res., vol. 16, pp. 15-23, 2019.

Y. Yu and W. J. Zhou, "Mixture of GANs for clustering," IJCAI Int. Jt. Conf. Artif. Intell., vol. 2018-July, pp. 3047-3053, 2018.

L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, "Modeling tabular data using conditional GAN," Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, 2019.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).