Improvement Model for Speaker Recognition using MFCC-CNN and Online Triplet Mining
How to cite (IJASEIT) :
S. H. Moi et al., “An Improved Approach to Iris Biometric Authentication Performance and Security with Cryptography and Error,” Int. J. Informatics Vis., vol. 6, no. August, pp. 531–539, 2022.
V. M. Arun Ross, Sudipta Banerjee, Cunjian Chen, Anurag Chowdhury, “Some Research Problems in Biometrics: The Future Beckons,” in IAPR International Conference on Biometrics (ICB), 2019.
C. Medjahed, A. Rahmoun, C. Charrier, and F. Mezzoudj, “A deep learning-based multimodal biometric system using score fusion,” IAES Int. J. Artif. Intell., vol. 11, no. 1, pp. 65–80, 2022, doi:10.11591/ijai.v11.i1.pp65-80.
I. K. G. D. Putra, D. Witarsyah, M. Saputra, and P. Jhonarendra, “Palmprint Recognition Based on Edge Detection Features and Convolutional Neural Network,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 11, no. 1, pp. 380–387, 2021, doi: 10.18517/ijaseit.11.1.11664.
R. Blanco-Gonzalo et al., “Biometric Systems Interaction Assessment: The State of the Art,” IEEE Trans. Human-Machine Syst., vol. 49, no. 5, pp. 397–410, 2019, doi: 10.1109/THMS.2019.2913672.
A. Pradhan, J. He, and N. Jiang, “Score, Rank, and Decision-Level Fusion Strategies of Multicode Electromyogram-Based Verification and Identification Biometrics,” IEEE J. Biomed. Heal. Informatics, vol. 26, no. 3, 2022, doi: 10.1109/JBHI.2021.3109595.
S. Shakil, D. Arora, and T. Zaidi, “An optimal method for identification of finger vein using supervised learning,” Meas. Sensors, vol. 25, Feb. 2023, doi: 10.1016/j.measen.2022.100583.
A. Sithara, A. Thomas, and D. Mathew, “Study of MFCC and IHC feature extraction methods with probabilistic acoustic models for speaker biometric applications,” Procedia Comput. Sci., vol. 143, pp. 267–276, 2018, doi: 10.1016/j.procs.2018.10.395.
D. Cai, Z. Cai, and M. Li, “Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition,” in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 1478–1482. doi: 10.23919/APSIPA.2018.8659595.
S. Hidayat, M. Tajuddin, S. A. Alodiayusuf, J. Qudsi, and N. N. Jaya, “Wavelet Detail Coefficient as A Novel Wavelet-MFCC Features in Text-Dependent Speaker Recognition System,” IIUM Eng. J., vol. 23, no. 1, 2022, doi: 10.31436/IIUMEJ.V23I1.1760.
A. Ashar, M. S. Bhatti, and U. Mushtaq, “Speaker Identification Using a Hybrid CNN-MFCC Approach,” in 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), IEEE, Mar. 2020, pp. 1–4. doi: 10.1109/ICETST49965.2020.9080730.
R. Jahangir et al., “Text-Independent Speaker Identification through Feature Fusion and Deep Neural Network,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2973541.
H. F. Pardede, A. R. Yuliani, and R. Sustika, “Convolutional Neural Network and Feature Transformation for Distant Speech Recognition,” Int. J. Electr. Comput. Eng., vol. 8, no. 6, pp. 5381–5388, 2018, doi: 10.11591/ijece.v8i6.pp5381-5388.
R. Jagiasi, S. Ghosalkar, P. Kulal, and A. Bharambe, “CNN based Speaker Recognition in Language and Text-independent Small Scale System,” in 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), 2019, pp. 176–179. doi:10.1109/I-SMAC47947.2019.9032667.
A. Maurya, D. Kumar, and R. K. Agarwal, “Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach,” in Procedia Computer Science, Elsevier B.V., 2018, pp. 880–887. doi:10.1016/j.procs.2017.12.112.
F. Reggiswarashari and S. W. Sihwi, “Speech emotion recognition using 2D-convolutional neural network,” Int. J. Electr. Comput. Eng., vol. 12, no. 6, pp. 6594–6601, 2022, doi:10.11591/ijece.v12i6.pp6594-6601.
K. J. Devi, A. A. Devi, and K. Thongam, “Automatic Speaker Recognition using MFCC and Artificial Neural Network,” Int. J. Innov. Technol. Explor. Eng., vol. 9, no. 1S, pp. 39–42, 2019, doi:10.35940/ijitee.a1010.1191s19.
V. Z. John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, “TIMIT Acoustic-Phonetic Continuous Speech Corpus.” 1993.
J. Villalba et al., “State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations,” Comput. Speech Lang., vol. 60, p. 101026, Mar. 2020, doi: 10.1016/j.csl.2019.101026.
H. R. Yulianto and Afiahayati, “Fighting COVID-19 : Convolutional Neural Network for Elevator User ’ s Speech Classification Neural in Bahasa Indonesia,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 84–91. doi: 10.1016/j.procs.2021.05.079.
Z. K. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,” IEEE Access, vol. 10, no. November, pp. 122136–122158, 2022, doi:10.1109/ACCESS.2022.3223444.
Heriyanto, S. Hartati, and A. P. Eko, “Ekstraksi Ciri Mel Frequency Cepstral Coefficient (MFCC) dan Rerata Coefficient untuk Pengecekan Bacaan Al-Quran,” Telematika, vol. 15, no. 02, pp. 99–108, 2018.
M. Altayeb and A. Al-ghraibah, “Classification of three pathological voices based on specific features groups using support vector machine,” Int. J. Electr. Comput. Eng., vol. 12, no. 1, pp. 946–956, 2022, doi: 10.11591/ijece.v12i1.pp946-956.
V. Panayotov, G. Chen, D. Povey, S. Khudanpur, and T. Johns, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5206–5210.
S. Ozturk and T. Cukur, “Deep Clustering via Center-Oriented Margin Free-Triplet Loss for Skin Lesion Detection in Highly Imbalanced Datasets,” IEEE J. Biomed. Heal. Informatics, vol. 26, no. 9, 2022, doi: 10.1109/JBHI.2022.3187215.
C. Zhang, S. Ranjan, and J. H. L. Hansen, “An Analysis of Transfer Learning for Domain Mismatched Text-independent Speaker Verification An analysis of transfer learning for domain mismatched text-independent speaker verification,” in Odyssey 2018 The Speaker and Language Recognition Workshop, 2018, pp. 181–186. doi:10.21437/Odyssey.2018-26.
F. Schroff, D. Kalenichenko, and J. Philbin, “FaceNet : A Unified Embedding for Face Recognition and Clustering”.
Z. Ren, Z. Chen, and S. Xu, “Triplet Based Embedding Distance and Similarity Learning for Text-independent Speaker Verification”.
L. Zhang, Z. Cheng, Y. Shen, and D. Wang, “Palmprint and palmvein recognition based on DCNN and a new large-scale contactless palmvein dataset,” Symmetry (Basel)., vol. 10, no. 4, pp. 1–15, 2018, doi: 10.3390/sym10040078.
H. Rahmat, S. Wahjuni, and H. Rahmawan, “Performance Analysis of Deep Learning-based Object Detectors on Raspberry Pi for Detecting Melon Leaf Abnormality,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 12, no. 2, pp. 572–579, 2022, doi: 10.18517/ijaseit.12.2.13801.
I. P. A. Dharmaadi, D. Witarsyah, I. P. A. Bayupati, and G. M. A. Sasmita, “Face Recognition Application Based on Convolutional Neural Network for Searching Someone’s Photo on External Storage,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 12, no. 3, pp. 1222–1228, 2022, doi: 10.18517/IJASEIT.12.3.11666.
M. K. Nandwana et al., “Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings,” no. September, pp. 1106–1110, 2018.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).