Using Multiple Regression Model and RNN for Imputing the Missing Values of PM10 Datasets

Moamin Amer Hasan Alsaeegh (1), Osamah Basheer Shukur (2)
(1) Department of Statistics and Informatics, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq
(2) Department of Statistics and Informatics, College of Computer Science and Mathematics, University of Mosul, Mosul, Iraq
Fulltext View | Download
How to cite (IJASEIT) :
Alsaeegh, Moamin Amer Hasan, and Osamah Basheer Shukur. “Using Multiple Regression Model and RNN for Imputing the Missing Values of PM10 Datasets”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 6, Dec. 2020, pp. 2582-9, doi:10.18517/ijaseit.10.6.11236.
The missing value in time series data is a scientific problem that should be solved by imputing these values by following some statistical techniques. This problem is more complex due to the missing values that existed in the dependent (response) variable. Particular matter (PM10) is a time series dataset used to scale air pollution as a dependent variable, while there are many types of pollutants used as independent variables. Malaysian datasets of PM10 and several climate pollutants are examined in this study. This study aims to impute the missing values for different missing rates in a dependent variable with minimum error. In this paper, the independent variables were supposed completed while the missing values have been replaced in different rates and different distributions within the dependent variable. Multiple linear regression (MLR) has been used as a traditional method to impute the different missing values of PM10. Recurrent neural network (RNN) is combined with MLR and used to impute the missing values of PM10. The results reflected that th hybrid method outperformed MLR for imputing the missing values of PM10. In conclusion, the hybrid method MLR-RNN can be used to impute the missing values of PM10 accurately compared to other traditional methods.

Hardle W., Simar L., " Applied multivariate statistical analysis ", Berlin and Louvain-la-Neuve, Germany, 2003.5-Neil H.Timm," Applied multivariate analysis ",Springer verlag New York, Inc, 2002.

Dubrov A., "Applied multivariate data analysis ", Statistica, Moscow, 1992.

GBD Factors Collaborators. Global, regional, and national comparative risk assessment of 79 behavioral, environmental and occupational and metabolic risks or clusters of risks, 1990-2015 a systematic analysis for the Global Burden of Disease Study 2015.Lancet.2016 oct, 388(10053):1659-1724.

Sharaf, H. K., Ishak, M. R., Sapuan, S. M., & Yidris, N. (2020). Conceptual design of the cross-arm for the application in the transmission towers by using TRIZ-morphological chart-ANP methods. Journal of Materials Research and Technology, 9(4), 9182-9188.”

Luo, Y., Cai, X., Zhang, Y., & Xu, J. (2018). Multivariate time series imputation with generative adversarial networks. In Advances in Neural Information Processing Systems (pp. 1596-1607).”

Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). Brits: Bidirectional recurrent imputation for time series. Advances in Neural Information Processing Systems, 31, 6775-6785.”

Suo, Q., Yao, L., Xun, G., Sun, J., & Zhang, A. (2019, June). Recurrent Imputation for Multivariate Time Series with Missing Values. In 2019 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 1-3). IEEE.”

Sharaf, H. K., Ishak, M. R., Sapuan, S. M., Yidris, N., & Fattahi, A. (2020). Experimental and numerical investigation of the mechanical behavior of full-scale wooden cross arm in the transmission towers in terms of load-deflection test. Journal of Materials Research and Technology, 9(4), 7937-7946.”

Nassar, L., Saad, M., Okwuchi, I. E., Chaudhary, M., Karray, F., & Ponnambalam, K. (2020, October). Imputation impact on strawberry yield and farm price prediction using deep learning. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3599-3605). IEEE.”

Saad, M., Nassar, L., Karray, F., & Gaudet, V. (2020, October). Tackling Imputation Across Time Series Models Using Deep Learning and Ensemble Learning. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 3084-3090). IEEE.”

Kim, C., Son, Y., & Youm, S. (2019). Chronic disease prediction using character-recurrent neural network in the presence of missing information. Applied Sciences, 9(10), 2170.”

Yoon, J., Zame, W. R., & van der Schaar, M. (2018). Estimating missing data in temporal data streams using multi-directional recurrent neural networks. IEEE Transactions on Biomedical Engineering, 66(5), 1477-1490.”

Sangeetha, M., & Kumaran, M. S. (2020). Deep learning-based data imputation on time-variant data using recurrent neural network. Soft Computing, 1-12.”

Khan, Z., Khan, S. M., Dey, K., & Chowdhury, M. (2019). Development and evaluation of recurrent neural network-based models for hourly traffic volume and annual average daily traffic prediction. Transportation Research Record, 2673(7), 489-503.”

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).