Emotion Recognition and Multi-class Classification in Music with MFCC and Machine Learning

Gilsang Yoo (1), Sungdae Hong (2), Hyeocheol Kim (3)
(1) Creative Informatics and Computing Institute, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea
(2) Division of Design, Seokyeong University, 124 Seogyeong-ro Seongbuk-gu Seoul, 02173, Republic of Korea
(3) Creative Informatics and Computing Institute, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul, 02841, Republic of Korea
Fulltext View | Download
How to cite (IJASEIT) :
Yoo, Gilsang, et al. “Emotion Recognition and Multi-Class Classification in Music With MFCC and Machine Learning”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 3, June 2024, pp. 818-25, doi:10.18517/ijaseit.14.3.18671.
Background music in OTT services significantly enhances narratives and conveys emotions, yet users with hearing impairments might not fully experience this emotional context. This paper illuminates the pivotal role of background music in user engagement on OTT platforms. It introduces a novel system designed to mitigate the challenges the hearing-impaired face in appreciating the emotional nuances of music. This system adeptly identifies the mood of background music and translates it into textual subtitles, making emotional content accessible to all users. The proposed method extracts key audio features, including Mel Frequency Cepstral Coefficients (MFCC), Root Mean Square (RMS), and MEL Spectrograms. It then harnesses the power of leading machine learning algorithms Logistic Regression, Random Forest, AdaBoost, and Support Vector Classification (SVC) to analyze the emotional traits embedded in the music and accurately identify its sentiment. Among these, the Random Forest algorithm, applied to MFCC features, demonstrated exceptional accuracy, reaching 94.8% in our tests. The significance of this technology extends beyond mere feature identification; it promises to revolutionize the accessibility of multimedia content. By automatically generating emotionally resonant subtitles, this system can enrich the viewing experience for all, particularly those with hearing impairments. This advancement not only underscores the critical role of music in storytelling and emotional engagement but also highlights the vast potential of machine learning in enhancing the inclusivity and enjoyment of digital entertainment across diverse audiences.

Sontakke, K. S., “Trends in OTT Platforms Usage During COVID-19 Lockdown in India”, Journal of Scientific Research, vol. 65. no. 8, pp. 23, 2021.

Kim, Woo-Hyeon, et al. “Multi-Modal Deep Learning Based Metadata Extensions for Video Clipping”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 1, Feb. 2024, pp. 375-80, doi:10.18517/ijaseit.14.1.19047.

Vaidya Veer P Gangwar, Vinay Sai Sudhagoni, Natraj Adepu and Sai Teja Bellamkonda, "European Journal of Molecular & Clinical Medicine", Profiles and Preferences of OTT users in Indian perspective, vol. 7, no. 8, 2020.

M. Yasen and S. Tedmori, "Movies Reviews Sentiment Analysis and Classification", IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 2019, pp. 860-865.

J. Kim, C. Nam and M.H. Ryu, "IPTV vs. emerging video services: Dilemma of telcos to upgrade the broadband", Telecommunication Policy, vol. 44, pp. 101889, 2020.

M. S. Nordin et al., "Stress Detection based on TEO and MFCC speech features using Convolutional Neural Networks (CNN)," 2022 IEEE International Conference on Computing (ICOCO), Kota Kinabalu, Malaysia, 2022, pp. 84-89, doi: 10.1109/ICOCO56118.2022.10031771.

M. Selvaraj, R. Bhuvana and S. Padmaja, "Human speech emotion recognition", Int. J. Eng. Technol, vol. 8, no. 1, pp. 311-323, 2016.

Z. Fu, G. Lu, K. M. Ting and D. Zhang, "A Survey of Audio-Based Music Classification and Annotation," in IEEE Transactions on Multimedia, vol. 13, no. 2, pp. 303-319, April 2011, doi: 10.1109/TMM.2010.2098858.

V. Bansal, G. Pahwa and N. Kannan, "Cough Classification for COVID-19 based on audio mfcc features using Convolutional Neural Networks," 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2020, pp. 604-608, doi: 10.1109/GUCON48875.2020.9231094.

S. A. A. Qadri, T. S. Gunawan, M. Kartiwi, H. Mansor and T. M. Wani, "Speech Emotion Recognition Using Feature Fusion of TEO and MFCC on Multilingual Databases", Lecture Notes in Electrical Engineering, vol. 730, pp. 681-691, 2022.

Q. Li et al., "MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications," in IEEE Access, vol. 8, pp. 48720-48730, 2020, doi: 10.1109/ACCESS.2020.2979799.

S. Masood, J. S. Nayal and R. K. Jain, "Singer identification in Indian Hindi songs using MFCC and spectral features," 2016 IEEE 1st International Conference on Power Electronics, Intelligent Control and Energy Systems (ICPEICES), Delhi, India, 2016, pp. 1-5, doi: 10.1109/ICPEICES.2016.7853641.

J. Dutta and D. Chanda, "Music Emotion Recognition in Assamese Songs using MFCC Features and MLP Classifier," 2021 International Conference on Intelligent Technologies (CONIT), Hubli, India, 2021, pp. 1-5, doi: 10.1109/CONIT51480.2021.9498345.

K. L. Ong, C. P. Lee, H. S. Lim, K. M. Lim and A. Alqahtani, "Mel-MViTv2: Enhanced Speech Emotion Recognition With Mel Spectrogram and Improved Multiscale Vision Transformers," in IEEE Access, vol. 11, pp. 108571-108579, 2023, doi: 10.1109/ACCESS.2023.3321122.

S. D. Handy Permana and T. K. A. Rahman, "Improved Feature Extraction for Sound Recognition Using Combined Constant-Q Transform (CQT) and Mel Spectrogram for CNN Input," 2023 International Conference on Modeling & E-Information Research, Artificial Learning and Digital Applications (ICMERALDA), Karawang, Indonesia, 2023, pp. 185-190, doi: 10.1109/ICMERALDA60125.2023.10458162.

Y. Khasgiwala and J. Tailor, "Vision Transformer for Music Genre Classification using Mel-frequency Cepstrum Coefficient," 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), Kuala Lumpur, Malaysia, 2021, pp. 1-5, doi: 10.1109/GUCON50781.2021.9573568.

S. -H. Cho, Y. Park and J. Lee, "Effective Music Genre Classification using Late Fusion Convolutional Neural Network with Multiple Spectral Features," 2022 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Yeosu, Korea, Republic of, 2022, pp. 1-4, doi: 10.1109/ICCE-Asia57006.2022.9954732.

G. Ulutas, G. Tahaoglu and B. Ustubioglu, "Forge Audio Detection Using Keypoint Features on Mel Spectrograms," 2022 45th International Conference on Telecommunications and Signal Processing (TSP), Prague, Czech Republic, 2022, pp. 413-416, doi: 10.1109/TSP55681.2022.9851327.

W. B. Zulfikar, Y. A. Gerhana, A. Y. P. Almi, D. S. Maylawati and M. I. A. Amin, "Mood of Song Detection Using Mel Frequency Cepstral Coefficient and Convolutional Neural Network with Tuning Hyperparameter," 2023 11th International Conference on Cyber and IT Service Management (CITSM), Makassar, Indonesia, 2023, pp. 1-6, doi: 10.1109/CITSM60085.2023.10455644.

K. Wang, C. Qian and L. Zhang, "Machine learning music emotion recognition based on audio features," 2023 IEEE 6th International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 2023, pp. 215-220, doi: 10.1109/ICISCAE59047.2023.10392981.

W. Wang, "CNN based music emotion recognition," 2021 2nd International Conference on Artificial Intelligence and Computer Engineering (ICAICE), Hangzhou, China, 2021, pp. 190-195, doi: 10.1109/ICAICE54393.2021.00044.

Melinda, Melinda, et al. “Design and Implementation of Mobile Application for CNN-Based EEG Identification of Autism Spectrum Disorder”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 1, Feb. 2024, pp. 57-64, doi:10.18517/ijaseit.14.1.19676.

Haque, Radiah, et al. “Classification Techniques Using Machine Learning for Graduate Student Employability Predictions”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 1, Feb. 2024, pp. 45-56, doi:10.18517/ijaseit.14.1.19549.

S. Khade, S. Gite, S. D. Thepade, B. Pradhan, and A. Alamri, “Detection of Iris Presentation Attacks Using Hybridization of Discrete Cosine Transform and Haar Transform with Machine Learning Classifiers and Ensembles,” IEEE Access, vol. 9, pp. 169231-169249, 2021, doi: 10.1109/ACCESS.2021.3138455.

M. H. Baffa, M. A. Miyim, and A. S. D. Dauda, "Machine learning for predicting students' employability," UMYU Sci., vol. 2, no. 1, 2023, doi:10.56919/usci.2123_001.

L. S. Hugo, “A comparison of machine learning models predicting student employment,” J. of Chemical Information and Modeling, vol. 53, no. 9. 2018, [Online]. Available: http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1544127100472053.

S. Islam, T. Akter, S. Zakir, S. Sabreen, and M. I. Hossain, "Autism Spectrum Disorder Detection in Toddlers for Early Diagnosis Using Machine Learning," 2020 IEEE Asia-Pacific Conf. Comput. Sci. Data Eng. CSDE 2020, 2020, doi: 10.1109/CSDE50874.2020.9411531.

M. A. Siddiqi and W. Pak, “An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection,” IEEE Access, vol. 9, pp. 137494-137513, 2021, doi: 10.1109/ACCESS.2021.3118361.

T. Le Minh, L. Van Tran, and S. V. T. Dao, “A Feature Selection Approach for Fall Detection Using Various Machine Learning Classifiers,” IEEE Access, vol. 9, pp. 115895-115908, 2021, doi: 10.1109/ACCESS.2021.3105581.

B. Wang and J. Zhang, “Logistic Regression Analysis for LncRNA-Disease Association Prediction Based on Random Forest and Clinical Stage Data,” IEEE Access, vol. 8, pp. 35004-35017, 2020, doi: 10.1109/ACCESS.2020.2974624.

A. Lucas, A. T. Williams, and P. Cabrales, “Prediction of Recovery from Severe Hemorrhagic Shock Using Logistic Regression,” IEEE J. Transl. Eng. Heal. Med., vol. 7, no. June, pp. 1-9, 2019, doi: 10.1109/JTEHM.2019.2924011.

Z. Zhang and Y. Han, “Detection of Ovarian Tumors in Obstetric Ultrasound Imaging Using Logistic Regression Classifier with an Advanced Machine Learning Approach,” IEEE Access, vol. 8, pp. 44999-45008, 2020, doi: 10.1109/ACCESS.2020.2977962.

J. C. Nwadiuto, S. Yoshino, H. Okuda, and T. Suzuki, “Variable Selection and Modeling of Drivers’ Decision in Overtaking Behavior Based on Logistic Regression Model with Gazing Information,” IEEE Access, vol. 9, pp. 127672-127684, 2021, doi: 10.1109/ACCESS.2021.3111753.

J. Xu, Y. Zhang, and D. Miao, "Three-way confusion matrix for classification: A measure driven view," Inf. Sci. (Ny)., vol. 507, pp. 772–794, 2020, doi: 10.1016/j.ins.2019.06.064.

Susetyoko, Ronny, et al. “An Improved Accuracy of Multiclass Random Forest Classifier With Continuous Attribute Transformation Using Random Percentile Generation”. International Journal on Advanced Science, Engineering and Information Technology, vol. 13, no. 3, June 2023, pp. 943-5, doi:10.18517/ijaseit.13.3.18379.

R. Susetyoko, W. Yuwono, E. Purwantini, and B. N. Iman, “Characteristics of Accuracy Function on Multiclass Classification Based on Best, Average, and Worst (BAW) Subset of Random Forest Model,” pp. 410-417, 2022, doi: 10.1109/ies55876.2022.9888374.

M. A. Ganaie, M. Tanveer, P. N. Suganthan, and V. Snasel, “Oblique and rotation double random forest,” Neural Networks, vol. 153, pp. 496-517, 2022, doi: 10.1016/j.neunet.2022.06.012.

M. Gencturk, A. Anil Sinaci, and N. K. Cicekli, “BOFRF: A Novel Boosting-based Federated Random Forest Algorithm on Horizontally Partitioned Data,” IEEE Access, vol. 10, no. August, pp. 89835-89851, 2022, doi: 10.1109/ACCESS.2022.3202008.

C. Zou et al., “Heartbeat Classification by Random Forest With a Novel Context Feature: A Segment Label,” IEEE J. Transl. Eng. Heal. Med., vol. 10, no. August 2022, doi: 10.1109/JTEHM.2022.3202749.

R. Susetyoko, W. Yuwono, E. Purwantini, and N. Ramadijanti, “Perbandingan Metode Random Forest, Regresi Logistik, Naí¯ve Bayes, dan Multilayer Perceptron Pada Klasifikasi Uang Kuliah Tunggal (UKT),” vol. 7, no. 1, 2022.

E. Ileberi, Y. Sun, and Z. Wang, “Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost,” IEEE Access, vol. 9, pp. 165286-165294, 2021, doi: 10.1109/ACCESS.2021.3134330.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).