Regression-based Analytical Approach for Speech Emotion Prediction based on Multivariate Additive Regression Spline (MARS)

Budi Triandi (1), Syahril Efendi (2), Herman Mawengkang (3), Sawaluddin (4)
(1) Doctoral Program, Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, 20155, Indonesia
(2) Faculty of Computer Science and Information Technology, Universitas Sumatera Utara, Medan, 20155, Indonesia
(3) Faculty of Mathematics and Sciences, Universitas Sumatera Utara, Medan, 20155, Indonesia
(4) Faculty of Mathematics and Sciences, Universitas Sumatera Utara, Medan, 20155, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Triandi, Budi, et al. “Regression-Based Analytical Approach for Speech Emotion Prediction Based on Multivariate Additive Regression Spline (MARS)”. International Journal on Advanced Science, Engineering and Information Technology, vol. 13, no. 6, Dec. 2023, pp. 2213-8, doi:10.18517/ijaseit.13.6.18603.
Using regression analysis techniques for speech-emotion recognition (SER) is an excellent method of resource efficiency. The labeled speech emotion data has high emotional complexity and ambiguity, making this research difficult. The maximum average difference is used to consider the marginal agreement between the source and target domains without focusing on the distribution of the previous classes in the two domains. To address this issue, we propose emotion recognition in speech using a regression analysis technique based on local domain adaptation. The results of this study show that the model's generalization ability with the function of the local additive method is very good for improving speech emotion recognition performance. Even though it provides excellent benefits in resource efficiency, regression analytical techniques are rarely used in the SER field; however, we believe this method is the best solution for SER problems. Using the Multivariate Additive Regression Spline, this study developed a predictive model for the existence of angry and non-angry emotions (MARS). Using probability analysis of error values, this approach can overcome regression on data that is not typically distributed. This method yields an ideal basis function that significantly impacts changes in emotional form. This study generates a prediction model with a Mean Square Error (MSE) of 0.0130, a Generalized Cross Validation (GCV) value of 0.0062, and a R Square (RSQ) value of 0.9721, yielding test results with a 97% accuracy rate.

S. Latif, J. Qadir, and M. Bilal, “Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition,” in 2019 8th International Conference on Affective Computing and Intelligent Interaction, ACII 2019, 2019.

E. Lieskovskí¡, M. Jakubec, R. Jarina, and M. Chmulí­k, “A review on speech emotion recognition using deep learning and attention mechanism,” Electronics (Switzerland), vol. 10, no. 10. 2021.

S. Dutta and S. Ganapathy, “Multimodal Transformer with Learnable Frontend and Self Attention for Emotion Recognition,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6917-6921.

R. Orjesek, R. Jarina, and M. Chmulik, “End-to-end music emotion variation detection using iteratively reconstructed deep features,” Multimed. Tools Appl., vol. 81, no. 4, pp. 5017-5031, 2022.

L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, and M. A. Mahjoub, “Speech emotion recognition: Methods and cases study,” in ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018, vol. 2.

P. Baki, H. Kaya, E. í‡iftí§i, H. Gí¼leí§, and A. A. Salah, “A Multimodal Approach for Mania Level Prediction in Bipolar Disorder,” IEEE Trans. Affect. Comput., vol. 13, no. 4, pp. 2119-2131, 2022.

R. Li, J. Zhao, J. Hu, S. Guo, and Q. Jin, “Multi-modal fusion for video sentiment analysis,” in Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, 2020, pp. 19-25.

V. Vielzeuf, C. Kervadec, S. Pateux, and F. Jurie, “The many variations of emotion,” in Proceedings - 14th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2019, 2019.

E. Avots, T. SapiÅ„ski, M. Bachmann, and D. KamiÅ„ska, “Audiovisual emotion recognition in wild,” in Machine Vision and Applications, 2019, vol. 30, no. 5.

J. Han, Z. Zhang, F. Ringeval, and B. Schuller, “Prediction-based learning for continuous emotion recognition in speech,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2017.

D. Le, Z. Aldeneh, and E. M. Provost, “Discretized continuous speech emotion recognition with multi-task deep recurrent neural network,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017, vol. 2017-Augus.

Y. Liu, J. She, H. Duan, and N. Qi, “Hybrid Model Based on Maxwell-Slip Model and Relevance Vector Machine,” IEEE Trans. Ind. Electron., vol. 68, no. 10, 2021.

D. Wu, P. Yan, Y. Guo, H. Zhou, and J. Chen, “A gear machining error prediction method based on adaptive Gaussian mixture regression considering stochastic disturbance,” J. Intell. Manuf., 2021.

Y. H. H. Tsai, M. Q. Ma, M. Yang, R. Salakhutdinov, and L. P. Morency, “Multimodal routing: Improving local and global interpretability of multimodal language analysis,” in EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2020.

R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar, and T. Alhussain, “Speech Emotion Recognition Using Deep Learning Techniques: A Review,” IEEE Access, vol. 7, 2019.

L. Tan et al., “Speech emotion recognition enhanced traffic efficiency solution for autonomous vehicles in a 5G-enabled space-air-ground integrated intelligent transportation system,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2830-2842, 2021.

L. Kerkeni, Y. Serrestou, M. Mbarki, K. Raoof, M. Ali Mahjoub, and C. Cleder, “Automatic Speech Emotion Recognition Using Machine Learning,” in Social Media and Machine Learning, 2020.

M. B. Akí§ay and K. OÄŸuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Communication, vol. 116. 2020.

A. í‡evik, G. W. Weber, B. M. Eyí¼boÄŸlu, and K. K. OÄŸuz, “Voxel-MARS: a method for early detection of Alzheimer’s disease by classification of structural brain MRI,” Ann. Oper. Res., vol. 258, no. 1, 2017.

D. Li, Y. Zhou, Z. Wang, and D. Gao, “Exploiting the potentialities of features for speech emotion recognition,” Inf. Sci. (Ny)., vol. 548, 2021.

D. Luo, Y. Zou, and D. Huang, “Investigation on joint representation learning for robust feature extraction in speech emotion recognition,” in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2018, vol. 2018-Septe.

H. A. Abdulmohsin, H. B. Abdul wahab, and A. M. J. Abdul hossen, “A new proposed statistical feature extraction method in speech emotion recognition,” Comput. Electr. Eng., vol. 93, 2021.

M. A. Siddiqui, S. A. Ali, and N. G. Haider, “Reduced Feature Set for Emotion Based Spoken Utterances of Normal and Special Children Using Multivariate Analysis and Decision Trees,” Eng. Technol. Appl. Sci. Res., vol. 8, no. 4, 2018.

S. Chebbi and S. Ben Jebara, “On the use of pitch-based features for fear emotion detection from speech,” in 2018 4th International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2018, 2018.

Mustaqeem, M. Sajjad, and S. Kwon, “Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM,” IEEE Access, vol. 8, 2020.

A. Bose, C. H. Hsu, S. S. Roy, K. C. Lee, B. Mohammadi-ivatloo, and S. Abimannan, “Forecasting stock price by hybrid model of cascading Multivariate Adaptive Regression Splines and Deep Neural Network,” Comput. Electr. Eng., vol. 95, 2021.

Y. Gu et al., “Mutual correlation attentive factors in dyadic fusion networks for speech emotion recognition,” in MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia, 2019.

S. E. Kahou et al., “EmoNets: Multimodal deep learning approaches for emotion recognition in video,” J. Multimodal User Interfaces, vol. 10, no. 2, 2016.

S. H. Samareh Moosavi and V. K. Bardsiri, “Poor and rich optimization algorithm: A new human-based and multi populations algorithm,” Eng. Appl. Artif. Intell., vol. 86, 2019.

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Computer Science, vol. 2, no. 3. 2021.

M. Shahbaz, N. Khraief, and M. K. Mahalik, “Investigating the environmental Kuznets’s curve for Sweden: evidence from multivariate adaptive regression splines (MARS),” Empir. Econ., vol. 59, no. 4, 2020.

J. R. Leathwick, J. Elith, and T. Hastie, “Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions,” Ecol. Modell., vol. 199, no. 2, 2006.

M. H. Ahmadi, B. Mohseni-Gharyehsafa, M. Farzaneh-Gord, R. D. Jilte, R. Kumar, and K. wing Chau, “Applicability of connectionist methods to predict dynamic viscosity of silver/water nanofluid by using ANN-MLP, MARS and MPR algorithms,” Eng. Appl. Comput. Fluid Mech., vol. 13, no. 1, 2019.

S. Dargan, M. Kumar, M. R. Ayyagari, and G. Kumar, “A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning,” Arch. Comput. Methods Eng., vol. 27, no. 4, 2020.

M. Besharati Fard, D. Hamidi, J. Alavi, R. Jamshidian, A. Pendashteh, and S. A. Mirbagheri, “Saline oily wastewater treatment using Lallemantia mucilage as a natural coagulant: Kinetic study, process optimization, and modeling,” Ind. Crops Prod., vol. 163, 2021.

J. H. Friedman, “Multivariate adaptive regression splines,” Ann. Stat., vol. 19, no. 1, pp. 1-67, 1991.

T. Hastie, R. Tibshirani, and R. Tibshirani, “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations Based on Extensive Comparisons,” Stat. Sci., vol. 35, no. 4, 2020.

X. Ju, J. M. Rosenberger, V. C. P. Chen, and F. Liu, “Global optimization on non-convex two-way interaction truncated linear multivariate adaptive regression splines using mixed integer quadratic programming,” Inf. Sci. (Ny)., vol. 597, 2022.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).