International Journal on Advanced Science, Engineering and Information Technology, Vol. 9 (2019) No. 2, List of accepted papers, DOI:10.18517/ijaseit.9.2.5040

Research on Missing Data Imputation Methods on Gene Expression

Fadilah Badari, RD Rohmat Saedudin, Zuraini Ali Shah, Shahreen Kasim, Seah Choon Sen, Maman Abdurohman


Microarray technologies allows for the monitoring expression levels of thousand of genes under a variety of condition. Gene expression data are accurate mostly but still contains error within its data set, as the microarray data obtained has many missing values. The result of microarray experiment consist of data sets with form of large of expression levels of genes as rows and under different experimental condition as columns and frequently with some value missing. The missing value presence can affect the result for visualization analysis of gene expression. This brings need to various machine learning methods implementation for this missing value problem by imputing values into the microarray. Imputation method include the replacement of missing values with estimated based on several information that originated from set of data. In this research, K-nearest Neighbour, Local Least Square, Bayesian Principal Component Analysis, mean and median imputation method are used for missing value imputation. The result from the implementation of imputation method is analysed for its performance by using two different types of classifiers that is support vector machine and neural network classification. From the result analysis, imputation technique using K-nearest Neighbour with highest accuracy value using SVM is 0.9146 and Local Least Square with accuracy value 0.8445 has prove better result in ANN. SVM have better accury compared to ANN after imputation.


data machine learning; imputation; classification; cancer ; WEKA

Viewed: 20 times (since Sept 4, 2017)

cite this paper