International Journal on Advanced Science, Engineering and Information Technology, Vol. 12 (2022) No. 6, pages: 2266-2273, DOI:10.18517/ijaseit.12.6.16145

Impact of Feature Selection and Data Augmentation for Pregnancy Risk Detection in Indonesia

Setio Basuki, Muhammad Irfan, Yufis Azhar


This paper aims to develop an automatic system for pregnancy risk detection in Indonesia. The system requires a sophisticated approach to achieve the required performance as a sensitive field. Existing works are developed using small-sized datasets and limited classification features. Moreover, all features treated equally make the detection results hard to interpret which features contribute more. To address these issues, we propose to combine more complex features, data augmentation methods, and feature selection techniques. We prefer to use all 118 pregnancy indicators and 400 instances from Puskesmas as an original dataset. Next, the new datasets are used to build two data augmentation methods, i.e., GMM and CTGAN. Each data augmentation method generates 2,000 new synthetic instances. Following this, five machine learning methods combined with three feature selection approaches, i.e., RFE, Random Forest, and Chi-Square, are implemented in all datasets. Through experiments, we observed that feature selection techniques play an essential role in improving classification accuracies. While the GMM-based augmentation demonstrated performance improvement, the CTGAN-based synthetic dataset depicted low performances. The best accuracy on all experiment settings reached 95%. By using Random Forest combined with RFE on a GMM-based dataset, the highest accuracy was achieved using only five features. Another notable result is that both XGBoost and Decision Tree reached the same 95% accuracy on the GMM-based dataset on only nine features. The overall results show that appropriate data augmentation and feature selection are a matter for achieving better performance in this research.


Ctgan; data augmentation; feature selection; pregnancy risk detection.

Viewed: 279 times (since abstract online)

cite this paper     download