International Journal on Advanced Science, Engineering and Information Technology, Vol. 13 (2023) No. 1, pages: 380-391, DOI:10.18517/ijaseit.13.1.16706

A Dataset-Driven Parameter Tuning Approach for Enhanced K-Nearest Neighbour Algorithm Performance

Udoinyang G. Inyang, Funebi F. Ijebu, Francis B. Osang, Aderenle A. Afoluronsho, Samuel S. Udoh, Imo J. Eyoh

Abstract

The number of Neighbours (k) and distance measure (DM) are widely modified for improved kNN performance. This work investigates the joint effect of these parameters in conjunction with dataset characteristics (DC) on kNN performance. Euclidean; Chebychev; Manhattan; Minkowski; and Filtered distances, eleven k values, and four DC, were systematically selected for the parameter tuning experiments. Each experiment had 20 iterations, 10-fold cross-validation method and thirty-three randomly selected datasets from the UCI repository. From the results, the average root mean squared error of kNN is significantly affected by the type of task (p<0.05, 14.53% variability effect), while DC collectively caused 74.54% change in mean RMSE values, k and DM accumulated the least effect of 25.4%. The interaction effect of tuning k, DC, and DM resulted in DM='Minkowski', 3≤k≤20, 7≤target dimension ≤9, and sample size (SS) >9000, as optimal performance pattern for classification tasks. For regression problems, the experimental configuration should be7000≤SS≤9000; 4≤number of attributes ≤6, and DM = 'Filtered'. The type of task performed is the most influential kNN performance determinant, followed by DM. The variation in kNN accuracy resulting from changes in k values only occurs by chance, as it does not depict any consistent pattern, while its joint effect of k value with other parameters yielded a statistically insignificant change in mean accuracy (p>0.5). As further work, the discovered patterns would serve as the standard reference for comparative analytics of kNN performance with other classification and regression algorithms.

Keywords:

kNN; kNN performance; k-Neighbours; parallel analysis; principal component analysis; kNN parameter tuning.

Viewed: 284 times (since abstract online)

cite this paper     download