International Journal on Advanced Science, Engineering and Information Technology, Vol. 10 (2020) No. 6, pages: 2582-2592, DOI:10.18517/ijaseit.10.6.11236

Using Multiple Regression Model and RNN for Imputing the Missing Values of PM10 Datasets

Moamin Amer Hasan Alsaeegh, Osamah Basheer Shukur


The missing value in time series data is a scientific problem that should be solved by imputing these values by following some statistical techniques. This problem is more complex due to the missing values that existed in the dependent (response) variable. Particular matter (PM10) is a time series dataset used to scale air pollution as a dependent variable, while there are many types of pollutants used as independent variables. Malaysian datasets of PM10 and several climate pollutants are examined in this study. This study aims to impute the missing values for different missing rates in a dependent variable with minimum error. In this paper, the independent variables were supposed completed while the missing values have been replaced in different rates and different distributions within the dependent variable. Multiple linear regression (MLR) has been used as a traditional method to impute the different missing values of PM10. Recurrent neural network (RNN) is combined with MLR and used to impute the missing values of PM10. The results reflected that th hybrid method outperformed MLR for imputing the missing values of PM10. In conclusion, the hybrid method MLR-RNN can be used to impute the missing values of PM10 accurately compared to other traditional methods.


multiple linear regression; MLR; missing values; recurrent neural network; RNN.

Viewed: 297 times (since abstract online)

cite this paper     download