International Journal on Advanced Science, Engineering and Information Technology, Vol. 12 (2022) No. 2, pages: 515-520, DOI:10.18517/ijaseit.12.2.15487

Grid Search CV Implementation in Random Forest Algorithm to Improve Accuracy of Breast Cancer Data

Dimas Aryo Anggoro, Nur Aini Afdallah

Abstract

Breast cancer is the most common cancer in women and is the second leading cause of global death. Disease diagnosis plays an important role in determining treatment strategies related to patient safety. Therefore, we need machine learning to predict disease. This paper aims to determine the best parameter values in breast cancer data using the Grid Search CV method and classify breast cancer data using the random forest algorithm. In addition, the paper aims to compare the accuracy values generated using the Grid Search CV and without the Grid Search CV. The method used to analyze breast cancer data in researchers is the Random Forest (RF) classification algorithm. In addition to using the Random Forest algorithm, this study also uses the Grid Search CV method. Grid Search CV is a method used to determine the optimal model parameters so that the classifier can predict the test data reliably. This study indicates that the highest accuracy value is obtained in the random forest algorithm using the grid search method of 0.9545. In contrast, the accuracy of the random forest algorithm without using the grid search method is 0.9480. For further research, it is suggested to develop a breast cancer dataset using the grid search cv method with other algorithms, such as Logistic Regression, Xgboost, and SVM. We can also use the same algorithm with different datasets to prove that the grid search cv method can increase accuracy.

Keywords:

Accuracy; breast cancer; grid search cv; random forest.

Viewed: 77 times (since abstract online)

cite this paper     download