Cite Article

Development of Rule-Based Feature Extraction in Multi-label Text Classification

Choose citation format

BibTeX

@article{IJASEIT8894,
   author = {Gugun Mediamer and - Adiwijaya and Said Al Faraby},
   title = {Development of Rule-Based Feature Extraction in Multi-label Text Classification},
   journal = {International Journal on Advanced Science, Engineering and Information Technology},
   volume = {9},
   number = {4},
   year = {2019},
   pages = {1460--1465},
   keywords = {multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.},
   abstract = {Hadith is the second main guidelines after the Holy Quran in the Islamic religion, which was revealed through the Messenger of Allah. Today, Hadith can classified by more than one class such as advice class, prohibited, and information to facilitate readers of Hadith in filtering the appropriate classes for each Hadith of Rasulullah SAW. In the course of research, there are many kinds of data involved in a text classification study. Therefore, special handling that fit with the characteristics of certain data is required. This study investigates the handling of multi-label data—Hadith Bukhari in Indonesian translation—focusing on feature extraction, feature weighted, and preprocessing methods. This study uses a rule-based feature extraction combined with several types of preprocessing along with three types of feature-weighted methods: TF-IDF, Word2vec, and Word2vec weighted with TF-IDF, the five preprocessing stages in this research: Case Folding, Tokenization, Remove Punctuation, Stopword Removal, and Stemming. From the 13 experiments conducted in this study consist of 2000 hadiths, it was found that the best performance for multi-label classification of Hadith data produced by the combination of the proposed rule-based feature extraction, Word2vec feature weighted method, and without using Stemming and Stopword Removal in the preprocessing phase. The Hamming Loss value obtained from this combination was 0.0623. The results show that our rule-based feature extraction method better than baseline method.},
   issn = {2088-5334},
   publisher = {INSIGHT - Indonesian Society for Knowledge and Human Development},
   url = {http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8894},
   doi = {10.18517/ijaseit.9.4.8894}
}

EndNote

%A Mediamer, Gugun
%A Adiwijaya, -
%A Faraby, Said Al
%D 2019
%T Development of Rule-Based Feature Extraction in Multi-label Text Classification
%B 2019
%9 multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.
%! Development of Rule-Based Feature Extraction in Multi-label Text Classification
%K multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.
%X Hadith is the second main guidelines after the Holy Quran in the Islamic religion, which was revealed through the Messenger of Allah. Today, Hadith can classified by more than one class such as advice class, prohibited, and information to facilitate readers of Hadith in filtering the appropriate classes for each Hadith of Rasulullah SAW. In the course of research, there are many kinds of data involved in a text classification study. Therefore, special handling that fit with the characteristics of certain data is required. This study investigates the handling of multi-label data—Hadith Bukhari in Indonesian translation—focusing on feature extraction, feature weighted, and preprocessing methods. This study uses a rule-based feature extraction combined with several types of preprocessing along with three types of feature-weighted methods: TF-IDF, Word2vec, and Word2vec weighted with TF-IDF, the five preprocessing stages in this research: Case Folding, Tokenization, Remove Punctuation, Stopword Removal, and Stemming. From the 13 experiments conducted in this study consist of 2000 hadiths, it was found that the best performance for multi-label classification of Hadith data produced by the combination of the proposed rule-based feature extraction, Word2vec feature weighted method, and without using Stemming and Stopword Removal in the preprocessing phase. The Hamming Loss value obtained from this combination was 0.0623. The results show that our rule-based feature extraction method better than baseline method.
%U http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8894
%R doi:10.18517/ijaseit.9.4.8894
%J International Journal on Advanced Science, Engineering and Information Technology
%V 9
%N 4
%@ 2088-5334

IEEE

Gugun Mediamer,- Adiwijaya and Said Al Faraby,"Development of Rule-Based Feature Extraction in Multi-label Text Classification," International Journal on Advanced Science, Engineering and Information Technology, vol. 9, no. 4, pp. 1460-1465, 2019. [Online]. Available: http://dx.doi.org/10.18517/ijaseit.9.4.8894.

RefMan/ProCite (RIS)

TY  - JOUR
AU  - Mediamer, Gugun
AU  - Adiwijaya, -
AU  - Faraby, Said Al
PY  - 2019
TI  - Development of Rule-Based Feature Extraction in Multi-label Text Classification
JF  - International Journal on Advanced Science, Engineering and Information Technology; Vol. 9 (2019) No. 4
Y2  - 2019
SP  - 1460
EP  - 1465
SN  - 2088-5334
PB  - INSIGHT - Indonesian Society for Knowledge and Human Development
KW  - multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.
N2  - Hadith is the second main guidelines after the Holy Quran in the Islamic religion, which was revealed through the Messenger of Allah. Today, Hadith can classified by more than one class such as advice class, prohibited, and information to facilitate readers of Hadith in filtering the appropriate classes for each Hadith of Rasulullah SAW. In the course of research, there are many kinds of data involved in a text classification study. Therefore, special handling that fit with the characteristics of certain data is required. This study investigates the handling of multi-label data—Hadith Bukhari in Indonesian translation—focusing on feature extraction, feature weighted, and preprocessing methods. This study uses a rule-based feature extraction combined with several types of preprocessing along with three types of feature-weighted methods: TF-IDF, Word2vec, and Word2vec weighted with TF-IDF, the five preprocessing stages in this research: Case Folding, Tokenization, Remove Punctuation, Stopword Removal, and Stemming. From the 13 experiments conducted in this study consist of 2000 hadiths, it was found that the best performance for multi-label classification of Hadith data produced by the combination of the proposed rule-based feature extraction, Word2vec feature weighted method, and without using Stemming and Stopword Removal in the preprocessing phase. The Hamming Loss value obtained from this combination was 0.0623. The results show that our rule-based feature extraction method better than baseline method.
UR  - http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8894
DO  - 10.18517/ijaseit.9.4.8894

RefWorks

RT Journal Article
ID 8894
A1 Mediamer, Gugun
A1 Adiwijaya, -
A1 Faraby, Said Al
T1 Development of Rule-Based Feature Extraction in Multi-label Text Classification
JF International Journal on Advanced Science, Engineering and Information Technology
VO 9
IS 4
YR 2019
SP 1460
OP 1465
SN 2088-5334
PB INSIGHT - Indonesian Society for Knowledge and Human Development
K1 multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.
AB Hadith is the second main guidelines after the Holy Quran in the Islamic religion, which was revealed through the Messenger of Allah. Today, Hadith can classified by more than one class such as advice class, prohibited, and information to facilitate readers of Hadith in filtering the appropriate classes for each Hadith of Rasulullah SAW. In the course of research, there are many kinds of data involved in a text classification study. Therefore, special handling that fit with the characteristics of certain data is required. This study investigates the handling of multi-label data—Hadith Bukhari in Indonesian translation—focusing on feature extraction, feature weighted, and preprocessing methods. This study uses a rule-based feature extraction combined with several types of preprocessing along with three types of feature-weighted methods: TF-IDF, Word2vec, and Word2vec weighted with TF-IDF, the five preprocessing stages in this research: Case Folding, Tokenization, Remove Punctuation, Stopword Removal, and Stemming. From the 13 experiments conducted in this study consist of 2000 hadiths, it was found that the best performance for multi-label classification of Hadith data produced by the combination of the proposed rule-based feature extraction, Word2vec feature weighted method, and without using Stemming and Stopword Removal in the preprocessing phase. The Hamming Loss value obtained from this combination was 0.0623. The results show that our rule-based feature extraction method better than baseline method.
LK http://ijaseit.insightsociety.org/index.php?option=com_content&view=article&id=9&Itemid=1&article_id=8894
DO  - 10.18517/ijaseit.9.4.8894