International Journal on Advanced Science, Engineering and Information Technology, Vol. 8 (2018) No. 1, pages: 18-29, DOI:10.18517/ijaseit.8.1.3490

Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data

Ashutosh Kumar Dubey, Umesh Gupta, Sonal Jain

Abstract

Breast cancer is one of the most common forms of cancer having a worldwide prevalence. Continuous research is going on for detecting breast cancer in its early stage as the possibility of cure is very high in the early stage. The two main objectives of this work were: firstly, to compare the performance of k-means and fuzzy c-means (FCM) clustering algorithms; and  secondly, to make an attempt to carefully consider and examine, from multiple points of view, the combination of different computational measures for k-means and FCM algorithms for a potential to achieve better clustering accuracy. K-means and FCM algorithms have been considered to understand the impact of clustering on the breast cancer data. The execution of k-means algorithm is based on centroid, distance, split method, threshold, epoch, BCW attribute, and number of iterations; while FCM is executed on the basis of fuzziness value and termination condition. The breast cancer Wisconsin (BCW) dataset was used for the experimentation. The combination of variance and same centroid offers better outcome in terms of k-means algorithm. The highest and lowest classification accuracies are (94.7%, 77.1 %) and (94.4%, 88.5%) for foggy and random centroid, respectively. The overall average positive prediction accuracy obtained by this approach is approximately 92%. In case of FCM, the highest and lowest classification accuracies are (97.2%, 91.1 %), (97.2%, 90.9%), (97.8%, 90.4%), and (97.1%, 90.2%) for different combination of fuzziness and termination criteria. The average highest and lowest classification accuracies are (95.7%, 94.7 %), (95.9%, 93.6%), (95.3%, 94.2%), and (95.6%, 93.7%) for the same combination in the case of FCM. K-means algorithm was more prominent and consistent in terms of computation time as FCM required more time to carry out several fuzzy calculations and iterations. The findings of this work provide an incisive and extensive understanding of the computational parameters used with k-means and c-means algorithms. The computational results indicate that FCM algorithm was found to be prominent and consistent than k-means algorithm when executed with different iterations, fuzziness values, and termination criteria. It is more potentially capable in classifying BCW dataset as the classification accuracy is more important than time. 

Keywords:

Breast cancer; breast cancer Wisconsin dataset; k-means; fuzzy c-means

Viewed: 534 times (since Sept 4, 2017)

cite this paper     download