Rainfall Prediction Using Statistical Downscaling Based on Support Vector Machine in Selangor

Nur Farah Amieera Mat Hussin (1), Shazlyn Milleana Shaharudin (2), Nurul Ainina Filza Sulaiman (3), Noor Hamizah Mohamad Sani (4), Sumayyah Aimi Mohd Najib (5), Hairulnizam Mahdin (6), Mohd Saiful Samsudin (7), Rasyidah (8)
(1) Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Perak, Malaysia
(2) Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Perak, Malaysia
(3) Department of Mechanical &Manufacturing, Kolej Vokasional Besut, Kampung Raja,Besut, Terengganu, Malaysia
(4) Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Perak, Malaysia
(5) Faculty of Human Sciences, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
(6) Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Johor, Malaysia
(7) Environmental Technology Division, School of Industrial Technology, Universiti Sains Malaysia, Gelugor, Penang, Malaysia
(8) Department of Information Technology, Politeknik Negeri Padang, Sumatera Barat, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
[1]
N. F. A. Mat Hussin, “Rainfall Prediction Using Statistical Downscaling Based on Support Vector Machine in Selangor”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 15, no. 1, pp. 345–352, Feb. 2025.
Global climate change gains notoriety in literature discussions for potentially triggering extreme change intensity and regularity, like floods and droughts. In this study, the amount of daily rainfall in Selangor was predicted using a downscaling model based on the machine learning technique of the Support Vector Machine (SVM) approach. The collected atmospheric data (predictor) and daily rainfall data (predictand) between 2008 and 2018 used incorporate five imputation methods: mean imputation, K-nearest neighbor, Nonlinear Iterative Partial Least Squares (NIPALS) algorithm, Markov Chain Monte Carlo (MCMC) multiple imputation algorithm, and Expectation Maximization (EM) algorithm. The predictor selection was obtained using Principal Component Analysis (PCA). Primarily, gamma, cost, and epsilon were determined using K-fold cross-validation. Once the parameter value was identified, varying kernel types (linear, RBF, polynomial, and sigmoid) allowed the SVM performance as a regression model to be measured. The SVM model was developed by first handling missing data using imputation methods. The model generating the lowest RMSE value performs best because the difference of the estimated and observed value is minor. PCA efficiently reduced data dimension while retaining key variabilities. The SVM model with a Radial Basis Function (RBF) kernel outperformed others in predicting daily rainfall by displaying the lowest RMSE during calibration (13.95071) and validation (12.60423). The most fitting parameter set for the SVM model is C set to 4.00,  y set to 1.935, and e set to 0.2. Based on the study, the SVM model performance is limited when applied to this dataset. For future studies, exploring advanced imputation techniques and broadening the methodology to other tropical climates for broader applicability are recommended.

M. H. I. Dore, “Climate change and changes in global precipitation patterns: What do we know?,” Environ. Int., vol. 31, no. 8, pp. 1167–1181, Oct. 2005. doi: 10.1016/j.envint.2005.03.004.

K. Thorpe, R. Greenwood, A. Eivers, and M. Rutter, “Prevalence and developmental course of ‘secret language,’” Int. J. Lang. Commun. Disord., vol. 36, no. 1, pp. 43–62, Jan. 2001. doi: 10.1080/13682820150217563.

R. Hock and B. Holmgren, “A distributed surface energy-balance model for complex topography and its application to Storglaciären, Sweden,” J. Glaciol., vol. 51, no. 172, pp. 25–36, 2005. doi: 10.3189/172756505781829566.

O. Saini and S. Sharma, “A review on dimension reduction techniques in data mining,” Comput. Eng. Intell. Syst., vol. 9, no. 1, pp. 7–14, 2018.

D. Zhang et al., “Comparison of NCEP-CFSR and CMADS for hydrological modelling using SWAT in the Muda River Basin, Malaysia,” Water, vol. 12, no. 11, p. 3288, Nov. 2020. doi: 10.3390/w12113288.

E. Oja, “Principal components, minor components, and linear neural networks,” Neural Netw., vol. 5, no. 6, pp. 927–935, Nov. 1992. doi: 10.1016/s0893-6080(05)80089-9.

D. Berrar, “Cross-validation,” in Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 2nd ed., vol. 1–3, Elsevier, 2024, pp. 542–545.

M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015. doi: 10.1126/science.aaa8415.

N. L. Hadipour, M. R. Delavar, and A. M. Malekmohammadi, “A statistical scale reduction approach for geospatial data generalization,” ISPRS Int. J. Geo-Inf., vol. 5, no. 12, p. 221, 2016. doi: 10.3390/ijgi5120221.

N. A. F. Sulaiman, S. M. Shaharudin, S. Ismail, N. H. Zainuddin, M. L. Tan, and Y. A. Jalil, “Predictive modelling of statistical downscaling based on hybrid machine learning model for daily rainfall in east-coast Peninsular Malaysia,” Symmetry, vol. 14, no. 5, p. 927, May 2022. doi: 10.3390/sym14050927.

J. E. Wang and J. Z. Qiao, “Parameter selection of SVR based on improved K-fold cross validation,” Appl. Mech. Mater., vol. 462–463, pp. 182–186, Nov. 2013. doi: 10.4028/www.scientific.net/amm.462-463.182.

R. C. Deo, P. Samui, and D. Kim, “Estimation of monthly evaporative loss using relevance vector machine, extreme learning machine, and multivariate adaptive regression spline models,” Stoch. Environ. Res. Risk Assess., vol. 30, no. 6, pp. 1769–1784, Sep. 2015. doi: 10.1007/s00477-015-1153-y.

H. F. Kaiser, “The application of electronic computers to factor analysis,” Educ. Psychol. Meas., vol. 20, no. 1, pp. 141–151, 1960. doi: 10.1177/001316446002000116.

S. M. Shaharudin, N. Ahmad, N. H. Zainuddin, and N. S. Mohamed, “Identification of rainfall patterns on hydrological simulation using robust principal component analysis,” Indones. J. Electr. Eng. Comput. Sci., vol. 11, no. 3, pp. 1162–1167, Sep. 2018. doi: 10.11591/ijeecs.v11.i3.pp1162-1167.

I. T. Jolliffe and J. Cadima, “Principal component analysis: A review and recent developments,” Philos. Trans. R. Soc. A Math. Phys. Eng. Sci., vol. 374, no. 2065, p. 20150202, 2016. doi: 10.1098/rsta.2015.0202.

A. Azid et al., “Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia,” Water Air Soil Pollut., vol. 225, no. 8, Jul. 2014. doi: 10.1007/s11270-014-2063-1.

C. W. Liu, K. H. Lin, and Y. M. Kuo, “Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan,” Sci. Total Environ., vol. 313, no. 1–3, pp. 77–89, Sep. 2003. doi: 10.1016/s0048-9697(02)00683-6.

K. K. Golnaraghi, Artificial Neural Networks in Hydrology. Springer, 2014. doi: 10.1007/978-3-642-38716-1.

M. Y. Cho and T. T. Hoang, “Feature selection and parameters optimization of SVM using particle swarm optimization for fault classification in power distribution systems,” Comput. Intell. Neurosci., vol. 2017, no. 1, pp. 1–9, 2017. doi: 10.1155/2017/4135465.

A. H. Ali and M. Z. Abdullah, “An efficient model for data classification based on SVM grid parameter optimization and PSO feature weight selection,” Int. J. Integr. Eng., vol. 12, no. 1, pp. 1–12, Jan. 2020. doi: 10.30880/ijie.2020.12.01.001.

H. S. Wheater, S. Mathur, and A. K. Gupta, “Application of statistical downscaling methods for climate change impact assessment in hydrology,” J. Hydrol., vol. 391, no. 1–2, pp. 1–18, 2010. doi: 10.1016/j.jhydrol.2010.07.004.

M. P. Goyal and R. S. Ojha, “Downscaling of precipitation using support vector machine (SVM),” Hydrol. Sci. J., vol. 57, no. 2, pp. 227–238, 2012. doi: 10.1080/02626667.2011.637042.

R. K. Mishra and R. K. Desai, “Downscaling of precipitation using support vector machine with radial basis function kernel,” Theor. Appl. Climatol., vol. 137, pp. 1769–1784, 2019. doi: 10.1007/s00704-018-2707-6.

G. Halik, N. Anwar, B. Santosa, and Edijatno, “Reservoir inflow prediction under GCM scenario downscaled by wavelet transform and support vector machine hybrid models,” Adv. Civ. Eng., vol. 2015, no. 1, pp. 1–9, 2015. doi: 10.1155/2015/515376.

X. Wu, H. A. Khorshidi, U. Aickelin, Z. Edib, and M. Peate, “Imputation techniques on missing values in breast cancer treatment and fertility data,” Health Inf. Sci. Syst., vol. 7, no. 1, Oct. 2019. doi: 10.1007/s13755-019-0082-4.

S. Deng, L. Wang, S. Guan, M. Li, and L. Wang, “Non-parametric nearest neighbor classification based on global variance difference,” Int. J. Comput. Intell. Syst., vol. 16, no. 1, Mar. 2023. doi: 10.1007/s44196-023-00200-1.

M. P. Becker, I. Yang, and K. Lange, “EM algorithms without missing data,” Stat. Methods Med. Res., vol. 6, no. 1, pp. 38–54, Jan. 1997. doi: 10.1191/096228097677258219.

A. F. Ochoa Muñoz, V. M. Gonzalez Rojas, and C. E. Pardo Turriago, “Missing data in multiple correspondence analysis under the available data principle of the NIPALS algorithm,” DYNA, vol. 86, no. 211, pp. 249–257, Oct. 2019. doi: 10.15446/dyna.v86n211.80261.

W. Ruth, “A review of Monte Carlo-based versions of the EM algorithm,”

arXiv, 2024. [Online]. Available: https://arxiv.org/abs/2401.00945. doi: 10.48550/arxiv.2401.00945.

M. Rafało, “Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis,” ICT Express, vol. 8, no. 2, pp. 183–188, Jun. 2022. doi: 10.1016/j.icte.2021.05.001.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).