Hybrid Preprocessing Method for Support Vector Machine for Classification of Imbalanced Cerebral Infarction Datasets

Zuherman Rustam (1), Dea A. Utami (2), Rahmat Hidayat (3), Jacub Pandelaki (4), Widyo A. Nugroho (5)
(1) Department of Mathematics, University of Indonesia, 16424 Depok, Indonesia
(2) Department of Mathematics, University of Indonesia, 16424 Depok, Indonesia
(3) Department of Information Technology, Politeknik Negeri Padang, Padang, Sumatra Barat, Indonesia
(4) Department of Radiology, Cipto Mangunkusumo Hospital, Jakarta 10430, Indonesia
(5) Department of Radiology, Cipto Mangunkusumo Hospital, Jakarta 10430, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Rustam, Zuherman, et al. “Hybrid Preprocessing Method for Support Vector Machine for Classification of Imbalanced Cerebral Infarction Datasets”. International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 2, Apr. 2019, pp. 685-91, doi:10.18517/ijaseit.9.2.8615.
Cerebral infarction is one of the causes of ischemic stroke in the brain, and machine learning can be used in the detection of cerebral infarction in the brain. In diagnosing the presence of cerebral infarction in the brain, machine learning is used because it is not enough just to use a CT scan to diagnose. Support vector machine (SVM) is a machine learning method that is known for its high accuracy value. However, SVM can produce less optimal results if the data used is imbalanced. If imbalanced data is used, the resulting model will be biased. Therefore, this study uses a hybrid preprocessing method for SVM on the classification of an imbalanced cerebral infarction dataset obtained from the Department of Radiology at Dr. Cipto Mangunkusumo Hospital. This method is a combination of several sampling methods that deal with the problem of imbalanced data and utilizes undersampling and oversampling techniques in combination with SVM. Oversampling modifying the infarction dataset through the duplication of data with a small number of classes to be balanced with a large number of data classes. While undersampling reducing data with a large number of classes to be balanced with a smaller number of data classes. Undersampling and Oversampling are combined into a hybrid method. This method is a hybrid method of the undersampling and oversampling that can be used in SVM. The results of hybrid method using SVM will be compared with the undersampling and oversampling using SVM, individually. And SVM method without preprocessing the imbalanced dataset. The accuracy of the proposed method reached 94% in our evaluations for SVM using a hybrid preprocessing method.

V.Bay, B.F.Kjolby, N.K.Iversen et al., “Stroke Infarct Volume Estimation in Fixed Tissue : Comparison of Diffusion Kurtosis Imaging to Diffusion Weighted Imaging and Histology in a Rodent MCAO Model”, PLoS ONE, vol. 13, no.4, e0196161, 2018.

G.Wang, J.Jing, Y.Pan, et al., “Does All Single Infarction have Lower Risk of Stroke Reccurence Than Multiple Infarctions in Minor Stroke?”, BMC Neurology, vol. 19, no.7, 2019.

I.A.Mentari, R.Naufalina, M.Rahmadi, J.Khotib, “Development of Ischemic Stroke Model By Right Unilateral Common Carotid Artery Occlusion (RUCCAO) Method”, Fol Med Indones, vol.54, no.3, pp.200-206, 2018.

M.F.Kabir, S.A.Ludwing, “Classification of Breast Cancer Risk Factors Using Several Resampling Approaches”, 17th IEEE International Conference on Machine Learning and Applications, 2018.

J.Burez, D.Van den Poel, “Handling Class Imbalanced in Customer Churn Prediction”, Expert Systems with Applications, vol.36, no.3, pp.4626-4636, 2009.

A. Amin, S. Anwar, A. Adnan et al., “Comparing oversampling techniques to handle the class imbalance problem: A customer churn prediction case study”, IEEE Access, vol. 4, pp. 7940-7957, 2016.

T.Vafeiadis, K.I.Diamantaras, G. Sarigiannidis, K.C.Chatzisavvas, “A Comparison of Machine Learning Techniques for Customer Churn Prediction”, Simulation Modelling Practice and Theory, vol.55, pp.1-9, 2015.

M.Buda, A.Maki, M.A.Mazurowski, “A Systematic Study of The Class Imbalance Problem in Convolutional Neural Network”, Neural Network, vol. 106, pp. 249-259, 2018.

H.He, E.A.Garcia, “Learning from Imbalanced Data”, IEEE Transactions on Knowledge and Data Engineering, Vol.21, No.9, 2009.

D.S.Sisodia, U.Verma, “The Impact of Data Re-Sampling on Learning Performance of Class Imbalanced Bankruptcy Prediction Models”, International Journal on Electrical Engineering and Informatics, vol.10, no. 2, 2018.

J.Luengo, A. Fernandez, S.Garcia, F.Herrera, “Addresing Data Complexity for Imbalanced Data Sets :Analysis of SMOTE-based Oversampling and Evolutionary Undersampling”, Soft Comput, vol. 15, pp.1909-1936, 2018.

U.R.Salunkhe, S.N.Mali, “Hybrid Approach for Class Imbalance Problem in Customer Churn Prediction : A Novel Extension to Under-Sampling”, I.J.Intelligent Systems and Applications, vol.5, pp.71-81, 2018.

H.Guo, X.Diao, H.Liu, “Embedding Undersampling Rotation Forest for Imbalanced Problem”, Hindawi Computational Intelligence and Neuroscience, 2018.

J.Liu, E.Zio, “Integration of Feature Vector Selection and Support Vector Machine for Classification of Imbalanced Data”, Applied Soft Computing Journal vol.75, pp. 702-711, 2017.

R.Batuwita, V.Palade, “Class Imbalance Learing Methods for Support Vector Machines”.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).