Text and Sound-Based Feature Extraction and Speech Emotion Classification for Korean
How to cite (IJASEIT) :
A.-H. Jo and K.-C. Kwak, “Speech Emotion Recognition Based on Two-Stream Deep Learning Model Using Korean Audio Information,” Applied Sciences, vol. 13, no. 4, p. 2167, Feb. 2023, doi:10.3390/app13042167.
J. Wagner et al., “Dawn of the Transformer Era in Speech Emotion Recognition: Closing the Valence Gap,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745–10759, Sep. 2023, doi: 10.1109/tpami.2023.3263585.
C. Hema and F. P. Garcia Marquez, “Emotional speech Recognition using CNN and Deep learning techniques,” Applied Acoustics, vol. 211, p. 109492, Aug. 2023, doi: 10.1016/j.apacoust.2023.109492.
C. Min et al., “Finding hate speech with auxiliary emotion detection from self-training multi-label learning perspective,” Information Fusion, vol. 96, pp. 214–223, Aug. 2023, doi:10.1016/j.inffus.2023.03.015.
V. Singh and S. Prasad, “Speech emotion recognition system using gender dependent convolution neural network,” Procedia Computer Science, vol. 218, pp. 2533–2540, 2023, doi:10.1016/j.procs.2023.01.227.
Kumar et al., “Multilayer Neural Network Based Speech Emotion Recognition for Smart Assistance,” Computers, Materials & Continua, vol. 74, no. 1, pp. 1523–1540, 2023, doi:10.32604/cmc.2023.028631.
Jain, Manas, et al. "Speech emotion recognition using support vector machine." arXiv preprint arXiv:2002.07590, 2020.
S. R. Kadiri and P. Alku, “Excitation Features of Speech for Speaker-Specific Emotion Detection,” IEEE Access, vol. 8, pp. 60382–60391, 2020, doi: 10.1109/access.2020.2982954.
H. Aouani and Y. B. Ayed, “Speech Emotion Recognition with deep learning,” Procedia Computer Science, vol. 176, pp. 251–260, 2020, doi: 10.1016/j.procs.2020.08.027.
D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomedical Signal Processing and Control, vol. 59, p. 101894, May 2020, doi:10.1016/j.bspc.2020.101894.
M. B. Akçay and K. Oğuz, “Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers,” Speech Communication, vol. 116, pp. 56–76, Jan. 2020, doi: 10.1016/j.specom.2019.12.001.
S. M. S. A. Abdullah, S. Y. A. Ameen, M. A. M. Sadeeq, and S. Zeebaree, “Multimodal Emotion Recognition using Deep Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 73–79, May 2021, doi: 10.38094/jastt20291.
W. Liu, J.-L. Qiu, W.-L. Zheng, and B.-L. Lu, “Comparing Recognition Performance and Robustness of Multimodal Deep Learning Models for Multimodal Emotion Recognition,” IEEE Transactions on Cognitive and Developmental Systems, vol. 14, no. 2, pp. 715–729, Jun. 2022, doi: 10.1109/tcds.2021.3071170.
Y. Cimtay, E. Ekmekcioglu, and S. Caglar-Ozhan, “Cross-Subject Multimodal Emotion Recognition Based on Hybrid Fusion,” IEEE Access, vol. 8, pp. 168865–168878, 2020, doi:10.1109/access.2020.3023871.
T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 02, pp. 1359–1367, Apr. 2020, doi:10.1609/aaai.v34i02.5492.
T. Mittal, P. Guhan, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “EmotiCon: Context-Aware Multimodal Emotion Recognition Using Frege’s Principle,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, doi:10.1109/cvpr42600.2020.01424.
C. Luna-Jiménez, D. Griol, Z. Callejas, R. Kleinlein, J. M. Montero, and F. Fernández-Martínez, “Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning,” Sensors, vol. 21, no. 22, p. 7665, Nov. 2021, doi: 10.3390/s21227665.
B. Chen, Q. Cao, M. Hou, Z. Zhang, G. Lu, and D. Zhang, “Multimodal Emotion Recognition With Temporal and Semantic Consistency,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3592–3603, 2021, doi: 10.1109/taslp.2021.3129331.
S. Lee, D. K. Han, and H. Ko, “Multimodal Emotion Recognition Fusion Analysis Adapting BERT With Heterogeneous Feature Unification,” IEEE Access, vol. 9, pp. 94557–94572, 2021, doi:10.1109/access.2021.3092735.
S. S. R, J. S. B, and R. R, “Comprehensive Speech Emotion Recognition System Employing Multi-Layer Perceptron (MLP) Classifier and libRosa Feature Extraction,” 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), Nov. 2023, doi: 10.1109/icscna58489.2023.10370394.
Y. C. Yoon, “Can We Exploit All Datasets? Multimodal Emotion Recognition Using Cross-Modal Translation,” IEEE Access, vol. 10, pp. 64516–64524, 2022, doi: 10.1109/access.2022.3183587.
Zeghidour, Neil, et al. "LEAF: A learnable frontend for audio classification." arXiv preprint arXiv:2101.08596, 2021
Baevski, Alexei, et al. "wav2vec 2.0: A framework for self-supervised learning of speech representations." In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 1044, 12449–12460.
Zoph, Barret, et al. "Rethinking pre-training and self-training." In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 323, 3833–3845.
F.-L. Chen et al., “VLP: A Survey on Vision-language Pre-training,” Machine Intelligence Research, vol. 20, no. 1, pp. 38–56, Jan. 2023, doi: 10.1007/s11633-022-1369-5.
Bao, Hangbo, et al. "Beit: Bert pre-training of image transformers." arXiv preprint arXiv:2106.08254, 2021.
El-Nouby, Alaaeldin, et al. "Are large-scale datasets necessary for self-supervised pre-training?." arXiv preprint arXiv:2112.10740, 2021.
Jiang, Ziyu, et al. "Robust pre-training by adversarial contrastive learning." In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA, Article 1359, 16199–16210.
L. H. Li et al., “Grounded Language-Image Pre-training,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, doi: 10.1109/cvpr52688.2022.01069.
L. Ruan and Q. Jin, “Survey: Transformer based video-language pre-training,” AI Open, vol. 3, pp. 1–13, 2022, doi:10.1016/j.aiopen.2022.01.001.
W. Ahmad, S. Chakraborty, B. Ray, and K.-W. Chang, “Unified Pre-training for Program Understanding and Generation,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, doi: 10.18653/v1/2021.naacl-main.211.
Yi, Cheng, et al. "Applying wav2vec2. 0 to speech recognition in various low-resource languages." arXiv preprint arXiv:2012.12121, 2020.
Mohamed, Omar, and Salah A. Aly. “ASER: Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset,” Transactions on Machine Learning and Artificial Intelligence, vol. 9, no. 6, pp. 1–8, Nov. 2021, doi: 10.14738/tmlai.96.11039.
M. Sharma, “Multi-Lingual Multi-Task Speech Emotion Recognition Using wav2vec 2.0,” ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022, doi: 10.1109/icassp43922.2022.9747417.
Y. C. Yoon, S. Y. Park, S. M. Park, and H. Lim, “Image classification and captioning model considering a CAM‐based disagreement loss,” ETRI Journal, vol. 42, no. 1, pp. 67–77, Jul. 2019, doi:10.4218/etrij.2018-0621.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).