International Journal on Advanced Science, Engineering and Information Technology, Vol. 10 (2020) No. 1, pages: 234-238, DOI:10.18517/ijaseit.10.1.8858

Indonesian Text Classification using Back Propagation and Sastrawi Stemming Analysis with Information Gain for Selection Feature

Mahendra Dwifebri Purbolaksono, Feddy Dea Reskyadita, - Adiwijaya, Arie Ardiyanti Suryani, Arief Fatchul Huda

Abstract

The second fundamental source of law for Moslems is the Hadith. The Hadith can be used to explain Quranic texts.  However, Hadith still needs to be translated according to each national language to easily understand its meaning [1]. In Indonesia Hadith more usually refers to a special class of relevance to more particular religious concern [1]. Base on that, this research will Classify the translation Hadith Text into three classes: Obligation, Prohibition, and Information. From previous research, the Back Propagation Neural Network (BPNN) showed good performance in classifying hadith text. Therefore, BPNN was used to solve the problem of hadith text classification in this study. However, the dataset has a huge number of varied bag-of-words, which are features that will be used in the classification process. Hence, Information Gain (IG) was utilized to select influential features, and as the sequential process before the classification process. To measure the performance of this system, the Macro F1-Score was used. The F1-Score enables one to observe exactness from precision and completeness from recall. The Macro F1-score is also needed for the performance evaluation of more than two classes.  Based on the experiment conducted, the system was able to classify hadith text using BPNN, IG, and without stemming, yielding the highest F1-score of 84.63%. However, the system performance that included the stemming process yielded an F1-score of 80.92%. This shows that the stemming process could decrease classification performance. This decreasing performance is due to some influential words merging with more noninfluential words.

Keywords:

feature selection; information gain; text mining; neural network; classification.

Viewed: 283 times (since Sept 4, 2017)

cite this paper     download