Improving Stemming Algorithm Using Morphological Rules

Titin Winarti (1), Djati Kerami (2), Lussiana ETP (3), Sunny Arief Sudiro (4)
(1) SEMARANG UNIVERSITY
(2) University of Indonesia
(3) School of Information Management Jakarta
(4) School of Information Management Jakarta
Fulltext View | Download
How to cite (IJASEIT) :
Winarti, Titin, et al. “Improving Stemming Algorithm Using Morphological Rules”. International Journal on Advanced Science, Engineering and Information Technology, vol. 7, no. 5, Oct. 2017, pp. 1758-64, doi:10.18517/ijaseit.7.5.1705.
Stemming words to remove suffixes has applications in text search, translation machine, summarization document, and text classification. For example, Indonesian stemming reduces the words “kebaikan”, “perbaikan”, “memperbaiki” and “sebaik-baiknya” to their common morphological root “baik”. In text search, this permits a search for player to find documents containing all words with the stem play. In the Indonesian language, stemming is of crucial importance: words have prefixes, suffixes, infixes, and confixes that make them match to relate difficult words. This research proposed a stemmer with more accurate word results by employing algorithm which gave more than one word candidate results and more than one affix combinations. New stemming algorithm is called CAT stemming algorithm. Here, the word results did not depend on the order of the morphological rule. All rules were checked and the word results were kept in a candidate list. To make an efficient stemmer, two kinds of word lists (vocabularies) were used: words that had more than one candidate words and list of root word as a candidate reference. The final word results were selected with several rules. This strategy was proved to have better result than the two most known about Indonesian stemmers. The experiments showed that the proposed approach gave higher accuracy than the compared systems known.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).