Human-Robot Interaction Based on Dialog Management Using Sentence Similarity Comparison Method

Dinda Ayu Permatasari (1), Hanif Fakhrurroja (2), Carmadi Machbub (3)
(1) School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jl. Ganesha 10, Bandung, 40132, Indonesia
(2) School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jl. Ganesha 10, Bandung, 40132, Indonesia
(3) School of Electrical Engineering and Informatics, Bandung Institute of Technology, Jl. Ganesha 10, Bandung, 40132, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Permatasari, Dinda Ayu, et al. “Human-Robot Interaction Based on Dialog Management Using Sentence Similarity Comparison Method”. International Journal on Advanced Science, Engineering and Information Technology, vol. 10, no. 5, Oct. 2020, pp. 1881-8, doi:10.18517/ijaseit.10.5.7606.
Advances in developing dialogue systems regarding speech recognition, language understanding, and speech synthesis. Dialogue systems to support human interaction with a robot efficiently by using spoken language. Facilities that provide convenience in carrying out daily activities for someone, such as older people, are necessary. The existence of Human-Robot Interaction (HRI), so that this interaction can give orders to the robot to do work that cannot be done by humans. This study presents a dialogue management system for HRI with a comparison sentence similarity method between TF-IDF (Term Frequency-Inverse Document Frequency) Cosine Similarity Algorithm and Jaccard Coefficient and using Finite State Machine (FSM). Dialogue Management is a way to find the response of the answer. When the user says something or in other words, is responsible for managing the flow of the conversation to command the robot. TF-IDF is used to give the weight of the term relationship and comparison between Cosine Similarity and Jaccard Coefficient for comparison method to determine the classification of similarity sentences from the dialogue manager to improve the intent of the dialogue, for the FSM method to set the sequence flow dialogue. We use Google Cloud Speech API as an engine for speech to text using Kinect V2 as an audio sensor. There are eight scenarios created in this system. The speech recognition process using Google Speech for an average of 2.62 seconds shows a reasonably fast response. TF-IDF Cosine Similarity method can produce enough accuracy of 97.43%, and Jaccard Coefficient indicates an accuracy level of 91.57%. The state of the FSM method can be considered as an efficient structure for building dialogue management. 

T. H. Bui, “Multimodal Dialogue Management - State of the art,” 2006.

D. A. Maharani, H. Fakhrurroja, Riyanto, and C. Machbub, “Hand gesture recognition using K-means clustering and Support Vector Machine,” in ISCAIE 2018 - 2018 IEEE Symposium on Computer Applications and Industrial Electronics, 2018, pp. 1-6.

L. Meng and M. Huang, “Dialogue intent classification with long short-term memory networks,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10619 LNAI, pp. 42-50, 2018.

M. C. Hsieh, W. S. Hung, S. W. Lin, and C. H. Luo, “Designing an assistive dialog agent for a case of spinal cord injury,” in Proceedings - 2009 9th International Conference on Hybrid Intelligent Systems, HIS 2009, 2009.

K. Sadohara et al., “Sub-lexical Dialogue Act Classification in a Spoken Dialogue System Support for the Elderly with Cognitive Disabilities,” Proc. Fourth Work. Speech Lang. Process. Assist. Technol., pp. 93-98, 2013.

S. Schwarzler, J. Schenk, G. Ruske, and F. Wallhoff, “A multi-agent framework for a hybrid dialog management system,” in 2009 IEEE International Conference on Multimedia and Expo, 2009, pp. 958-961.

H. Holzapfel, “A dialogue manager for multimodal human-robot interaction and learning of a humanoid robot,” Ind. Rob., vol. 35, no. 6, pp. 528-535, 2008.

C. Lee, Y. S. Cha, and T. Y. Kuc, “Implementation of dialogue system for intelligent service robots,” in 2008 International Conference on Control, Automation and Systems, ICCAS 2008, 2008.

A. Raux and M. Eskenazi, “A Finite-State Turn-Taking Model for Spoken Dialog Systems,” in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, 2009.

S. Yi and K. Jung, “A Chatbot by Combining Finite State Machine, Information Retrieval, and Bot-Initiative Strategy,” Alexa Price Proc., pp. 1-10, 2017.

B. Su, T. Kuan, S. Tseng, J. Wang, and P. Su, “Improved TF-IDF weight method based on sentence similarity for spoken dialogue system,” in 2016 International Conference on Orange Technologies (ICOT), 2016, pp. 36-39.

C. Lee, S. Jung, S. Kim, and G. G. Lee, “Example-based dialog modeling for practical multi-domain dialog system,” Speech Commun., 2009.

H. Fakhrurroja, D. A. Permatasari, A. Purwarianti, and C. Machbub, “Dialogue Management for Human Robot Interaction Using Artificial Intelligence Markup Language,” in ICEECS 2018 International Conference on Electrical Engineering and Computer Science, 2018.

X. Zhang and Y. LeCun, “Character-level Convolutional Networks for Text Classificatio,” in Advances in Neural Information Processing Systems 28, 2015.

D. Petcu, C. Craciun, and M. Rak, “Towards a Cross Platform Cloud API - Components for Cloud Federation,” in CLOSER, 2011.

Google, “Google Speech API,” Google Cloud Platform, 2017.

M. Assefi, G. Liu, M. P. Wittie, and C. Izurieta, “An Experimental Evaluation of Apple Siri and Google Speech Recognition,” Proccedings 2015 ISCA SEDE, 2015.

C. C. Chiu et al., “State-of-the-Art Speech Recognition with Sequence-to-Sequence Models,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2018.

C. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval. Cambridge University Press, 2009.

N. Agarwal, M. Rawat, and M. Vijay, “Comparative Analysis Of Jaccard Coefficient and Cosine Similarity for Web Document Similarity Measure,” Int. J. Adv. Res. Eng. Technol., 2014.

H. Gomaa, Real-Time Software Design For Embedded Systems. New York: Cambridge University Press, 2016.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).