Cite Article

Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter

Choose citation format

BibTeX

@article{IJASEIT8123,
   author = {Muhammad Okky Ibrohim and Indra Budi},
   title = {Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {9},
   number = {4},
   year = {2019},
   pages = {1116--1123},
   keywords = {social media; multilingual hate speech identification; machine learning.},
   abstract = {Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.},
   issn = {2088-5334},
   publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},
   url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8123},
   doi = {10.18517/ijaseit.9.4.8123}
}

EndNote

%A Ibrohim, Muhammad Okky
%A Budi, Indra
%D 2019
%T Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter
%B 2019
%9 social media; multilingual hate speech identification; machine learning.
%! Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter
%K social media; multilingual hate speech identification; machine learning.
%X Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.
%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8123
%R doi:10.18517/ijaseit.9.4.8123
%J International Journal on Advanced Science, Engineering and Information Technology
%V 9
%N 4
%@ 2088-5334

IEEE

Muhammad Okky Ibrohim and Indra Budi,"Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 4, pp. 1116-1123, 2019. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.9.4.8123.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Ibrohim, Muhammad Okky
AU  - Budi, Indra
PY  - 2019
TI  - Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 9 (2019) No. 4
Y2  - 2019
SP  - 1116
EP  - 1123
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - social media; multilingual hate speech identification; machine learning.
N2  - Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.
UR  - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8123
DO  - 10.18517/ijaseit.9.4.8123

RefWorks

RT Journal Article
ID 8123
A1 Ibrohim, Muhammad Okky
A1 Budi, Indra
T1 Translated vs Non-Translated Method for Multilingual Hate Speech Identification in Twitter
JF International Journal on Advanced Science, Engineering and Information Technology
VO 9
IS 4
YR 2019
SP 1116
OP 1123
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 social media; multilingual hate speech identification; machine learning.
AB Nowadays social media is often misused to spread hate speech. Spreading hate speech is an act that needs to be handled in a special way because it can undermine or discriminate other people and cause conflict that leading to both material and immaterial losses. There are several challenges in building a hate speech identification system; one of them is identifying hate speech in multilingual scope. In this paper, we adapt and compare two methods in multilingual text classification which are translated (with and without language identification) and non-translated method for multilingual hate speech identification (including Hindi, English, and Indonesian language) using machine learning approach. We use some classification algorithms (classifiers) namely Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) with word n-grams and char n-grams (character n-grams) as feature extraction. Our experiment result shows that the non-translated method gives the best result. However, the use of non-translated method needs to be reconsidered because this method needs more cost for data collection and annotation. Meanwhile, translated without language identification method give a poor result. To address this problem, we combine translated method with monolingual hate speech identification, and the experiment result shows that this approach can increase the multilingual hate speech identification performance compared to translate without language identification. This paper discusses the advantages and disadvantages for all method and the future works to enhance the performance in multilingual hate speech identification.
LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8123
DO  - 10.18517/ijaseit.9.4.8123