Comparison of Fuzzy C-Means, Fuzzy Kernel C-Means, and Fuzzy Kernel Robust C-Means to Classify Thalassemia Data

Zuherman Rustam (1), Annisa Kamalia (2), Rahmat Hidayat (3), Fajar Subroto (4), Aditya Suryansyah S (5)
(1) Department of Mathematics, University of Indonesia, Depok, 16242, Indonesia
(2) Department of Mathematics, University of Indonesia, Depok, 16242, Indonesia
(3) Department of Information Technology, Politeknik Negeri Padang, Padang, 25163, Indonesia
(4) Harapan Kita Children and Womens’s Hospital, Jakarta, 11420, Indonesia
(5) Harapan Kita Children and Womens’s Hospital, Jakarta, 11420, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Rustam, Zuherman, et al. “Comparison of Fuzzy C-Means, Fuzzy Kernel C-Means, and Fuzzy Kernel Robust C-Means to Classify Thalassemia Data”. International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 4, Aug. 2019, pp. 1205-10, doi:10.18517/ijaseit.9.4.9580.
Among the inherited blood disorders in Southeast Asia, thalassemia is the most prevalent. Thalassemias are pathologies that derive from genetic defects of the globin genes. Thalassemia is also considered a health burden among the world’s population. Thalassemia cannot be cured, but there is a method to prevent the occurrence of thalassemia by early detection with  screening. The aim is to identify the suspected unrecognised diseases in a population that seems healthy and asymptomatic using tests, examinations, or other procedures that can be applied quickly and easily to the target population. Research on thalassemia has been done extensively, such as testing the accuracy of β-thalassemia data in Thailand using the Bayesian Network and Multinomial Logistic Regression. In this study, we will compare the performance of the classification of thalassemia data by Fuzzy C-Means, Fuzzy Kernel C-Means, and Fuzzy Kernel Robust C-Means. The author uses thalassemia data from Indonesia, acquired from Harapan Kita Children and Womens’s Hospital,  Jakarta, that consists of 82 thalassemia samples from the patients of thalassemia and 68 non-thalassemia samples with 11 features. In total, there are 150 data patients used in this paper. The results show the accuracy of the classification. The accuracy of FCM is 100% when training data is 90%, FRCM is 100% when training data is 90%, and FKRCM, which is the modified Fuzzy, 100% when we use the and 80% & 90% training data. This result denote that Fuzzy C-Means, Fuzzy Robust C-Means, and Fuzzy Kernel Robust C-Means perfectly classify thalassemia data from Indonesia.

The World Health Organization (WHO) website. [Online]. Available : https://www.who.int/genomics/public/geneticdiseases/en/index2.html

M. I. Khan, H. N.Khan, and M. Usman, “Beta thalassemia trait; diagnostic importance of haematological indices in detecting beta thalassemia trait patients,” The Professional Medical Journal, vol. 25, no.4, pp. 545-550, 2018.

P. L. Greenberg, V. Gordeuk, S. Issaragrisil, N. Siritanaratkul, S. Fucharoen, and R. C. Ribeiro, “Major Hematologic Diseases in the Developing World— New Aspects of Diagnosis and Management of Thalassemia, Malarial Anemia, and Acute Leukemia,” American Society of Hematology, pp. 479-498, 2001.

M. Peters, H. Heijboer, and P. C. Giordano, “Diagnosis and management of thalassaemia”, BMJ, vol. 7.

X. Gu and Y. Zeng, “A Review of the Molecular Diagnosis of Thalassemia,” Hematology, vol. 7, no. 4, pp. 203-209, 2002.

S. R. Amendolia, G. Cossub, M. L. Ganaduc, B. Golosioa, G. L.Masala, and G. M. Mura, “A comparative study of K-Nearest Neighbour, Support Vector Machine and Multi-Layer Perceptron for Thalassemia screening,” Chemometrics and Intelligent Laboratory System, vol. 69(1-2), pp. 13-20, 2003.

S. R. Amendolia, A. Brunetti, P. Carta, G. Cossu, M. L. Ganadu, B. Golosio, G. M. Mura, M. G. Pirastru, “A Real-Time Classification System of Thalassemic Pathologies Based on Artificial Neural Networks,” Medical Decision Making, pp. 18-26, 2002.

P. Paokanta, M. Ceccarelli, and S. Srichairatanakool, “The Effeciency of Data Types for Classification Performance of Machine Learning Techniques for Screening β-Thalassemia,” IEEE, 2010.

A. S. AlAgha, H. Faris, B. H. Hammo, A. M. AlZoubi, “Identifying β-thalassemia carriers using a data mining approach: The case of the Gaza Strip, Palestine,” Artificial Intelligence in Medicine, vol. 88, pp. 70-83, 2018.

D. Setsirichok, T. Piroonratana, W. Wongseree, T. Usavanarong, N. Paulkhaolam, C. Kanjanakom, ”¦ , N. Chaiyaratana, “Classification of complete blood count and haemoglobin typing data by a C4.5 decision tree, a Naí¯ve Bayes classifier and a Multilayer Perceptron for

Thalassaemia screening,” Biomedical Signal Processing and Control, vol. 7, No. 2, pp. 202-212, 2012.

P. Paokanta, N. Harnpornchai, and N. Chakpitak, “The Classification Performance of Binomial Logistic Regression Based on Classical and Bayesian Statistics for Screening β-Thalassemia”, in The 3rd International Conference on Data Mining and Intelligent Information Technology Applications, 2011, pp. 236-241.

D.A. Puspitasari, Z. Rustam, “Application of SVM-KNN using SVR as Feature Selection on Stock Analysis for Indonesian Stock Exchange,” Proceeding of 3rd International Symposium on Current Progress in Mathematics and Sciences, 2017

Z. Rustam, D.F. Vibranti, D. Widya, “Predicting The Direction of Indonesian Stock Price Movement using Support Vector Machines and Fuzzy Kernel C-Means,” Proceeding of 3rd International Symposium on Current Progress in Mathematics and Sciences, 2017.

Z. Rustam, Fanita, “Predicting The Jakarta Composite Index Price using ANFIS and Classifying Prediction Result Based on Relative Error by Fuzzy Kernel C-Means,” Proceeding of 3rd International Symposium on Current Progress in Mathematics and Sciences, 2017.

Z. Rustam and A.S. Talita, “Fuzzy Kernel C-Means Algorithm for Intrusion Detection Systems,” Journal of Theoritical and Applied Information Technology, vol. 81, 2015.

J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, NewYork, 1981.

Z. Rustam and D. Zahras, “Comparison between Support Vector Machine and Fuzzy C-Means as Classifier for Intrusion Detection System,” in 2nd International Conference on Statistics, Mathematics, Teaching, and Research, 2018, pp. 1-6.

Z. Rustam and A. S. Talita, “Fuzzy Kernel K-Medoids Algorithm for Multiclass Multidimensional Data Classification”, Journal of Theoretical and Applied Information Technology, vol. 80, Issue 1, 2015.

A. Wulan, V. M. Jannati, Z. Rustam, and A. F. Ahmad, “Application Kernel Modified Fuzzy C-Means for Gliomatosis Cerebri,” International Conference on Mathematics, Statistics, and Their Applications, pp. 35-38, 2016.

J. Han, M. Kamber, J. Pei, Data mining concepts and techniques. Waltham, Massachusetts: Morgan Kaufmann Publishers, 2012.

N. B. Karayiannis and J. C. Bezdek, “An Integrated Approach to Fuzzy Learning Vector Quantization and Fuzzy C-Means Clustering”, IEEE

Trans. Fuzzy Systems, vol. 5, no. 4, pp. 622-628, 1997.

Z. Rustam and F. Yaurita, “Insolvency Prediction in Insurance Companies Using Support Vector Machines and Fuzzy Kernel C-Means,” in 2nd International Conference on Statistics, Mathematics, Teaching, and Research, 2018, pp. 1-9.

S. R. Kannan, M. Siva, S. Ramathilagam, and R. Devi, “Effective Kernel-Based Fuzzy Clustering Systems in Analyzing Cancer Database,” Data Enabled Discovery and Applications, pp. 85-92, 2018.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).