Automated Label Extraction for Sentiment Analysis in Indonesian Text

Khairul Imtihan (1), Lalu Mutawalli (2), Wire Bagye (3), Ahmad Tantoni (4)
(1) Information Systems Department, STMIK Lombok, Praya, Lombok Tengah, Indonesia
(2) Information Systems Department, STMIK Lombok, Praya, Lombok Tengah, Indonesia
(3) Informatics Engineering Department, STMIK Lombok, Praya, Lombok Tengah, Indonesia
(4) Informatics Engineering Department, STMIK Lombok, Praya, Lombok Tengah, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
[1]
K. Imtihan, Lalu Mutawalli, Wire Bagye, and Ahmad Tantoni, “Automated Label Extraction for Sentiment Analysis in Indonesian Text”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 15, no. 3, pp. 718–728, Jun. 2025.
Sentiment analysis plays a crucial role in helping businesses understand consumer perceptions, improve decision-making, and enhance customer satisfaction. However, large-scale sentiment classification in Indonesian-language texts remains a challenge due to the scarcity of labeled datasets and limited computational resources. This study introduces an automated sentiment labeling approach that integrates chunking and Rule-Based Machine Translation (RBMT) to optimize efficiency and accuracy. Unlike Self-Supervised Learning (SSL), Active Learning (AL), and Transformer-based models (e.g., BERT), which demand extensive labeled data and high-performance computing, the proposed method offers a scalable and resource-efficient solution. A dataset comprising 225,000 entries was preprocessed and segmented into smaller chunks to enhance processing efficiency. Seven classification algorithms, Decision Tree, Support Vector Machine (SVM), Random Forest, K-Nearest Neighbors (KNN), Naïve Bayes, Logistic Regression, and Multilayer Perceptron (MLP), were employed for performance evaluation. Results show that MLP and Random Forest achieve the highest accuracy, ranging from 0.886 to 0.900, confirming their effectiveness for sentiment classification. Furthermore, the proposed Chunking + RBMT method achieves 89.9% accuracy, outperforming SSL (87.3%) and AL (86.5%), while maintaining significantly lower computational requirements compared to Transformer-based models (90.5%). This study demonstrates the effectiveness of Chunking in reducing computational overhead while maintaining high classification accuracy. Overall, the findings validate the proposed approach as a practical alternative for large-scale sentiment classification in low-resource settings, with strong potential to improve automated sentiment analysis in the Indonesian language.

M. Bordoloi and S. K. Biswas, "Sentiment analysis: A survey on design framework, applications and future scopes," Artif. Intell. Rev., vol. 56, no. 11, pp. 1-42, 2023, doi: 10.1007/s10462-023-10442-2.

K. N. Lemon and P. C. Verhoef, "Understanding customer experience throughout the customer journey," J. Mark., vol. 80, no. 6, pp. 69-96, 2016, doi: 10.1509/jm.15.0420.

L. Heryawan, D. Novitaningrum, K. R. Nastiti, and S. N. Mahmudah, "Medical record document search with TF-IDF and vector space model (VSM)," Int. J. Adv. Sci. Eng. Inf. Technol., vol. 14, no. 3, pp. 847-852, Jun. 2024, doi: 10.18517/ijaseit.14.3.19606.

M. A. Sghaier and M. Zrigui, "Rule-based machine translation from Tunisian dialect to modern standard Arabic," Procedia Comput. Sci., vol. 176, pp. 310-319, 2020, doi: 10.1016/j.procs.2020.08.033.

D. Norris and K. Kalm, "Chunking and data compression in verbal short-term memory," Cognition, vol. 208, May 2021, Art. no. 104534, doi: 10.1016/j.cognition.2020.104534.

Z. Zheng et al., "Contextualized query expansion via unsupervised chunk selection for text retrieval," Inf. Process. Manag., vol. 58, no. 5, Sep. 2021, Art. no. 102672, doi: 10.1016/j.ipm.2021.102672.

S. Khomsah, N. H. Cahyana, and A. S. Aribowo, "Hyperparameter tuning of semi-supervised learning for Indonesian text annotation," Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 9, pp. 250-256, 2023, doi:10.14569/ijacsa.2023.0140927.

M. Kim and I. Lee, "Human-guided auto-labeling for network traffic data: The GELM approach," Neural Netw., vol. 152, pp. 510-526, Aug. 2022, doi: 10.1016/j.neunet.2022.05.007.

M. A. Morrison et al., "A user-guided tool for semi-automated cerebral microbleed detection and volume segmentation: Evaluating vascular injury and data labelling for machine learning," NeuroImage: Clin., vol. 20, pp. 498-505, 2018, doi: 10.1016/j.nicl.2018.08.002.

L. Liu et al., "Automated ICD coding using extreme multilabel long text transformer-based models," Artif. Intell. Med., vol. 144, 2023, Art. no. 102662, doi: 10.1016/j.artmed.2023.102662.

J. Zhang et al., "Multistep automated data labelling procedure (MADLaP) for thyroid nodules on ultrasound: An artificial intelligence approach for automating image annotation," Artif. Intell. Med., vol. 141, 2023, Art. no. 102553, doi:10.1016/j.artmed.2023.102553.

D. Zhang et al., "Automated labeling and online evaluation for self-paced movement detection BCI," Knowl.-Based Syst., vol. 265, 2023, Art. no. 110383, doi: 10.1016/j.knosys.2023.110383.

D. A. Wood et al., "Deep learning to automate the labelling of head MRI datasets for computer vision applications," Eur. Radiol., vol. 32, no. 3, pp. 725-736, 2022, doi: 10.1007/s00330-021-08132-0.

J. L. Guerra, C. Catania, and E. Veas, "Datasets are not enough: Challenges in labeling network traffic," Comput. Secur., vol. 120, Sep. 2022, Art. no. 102810, doi: 10.1016/j.cose.2022.102810.

M. Dallel, V. Havard, Y. Dupuis, and D. Baudry, "Digital twin of an industrial workstation: A novel method of an auto-labeled data generator using virtual reality for human action recognition in the context of human-robot collaboration," Eng. Appl. Artif. Intell., vol. 118, 2023, Art. no. 105655, doi: 10.1016/j.engappai.2022.105655.

X. Xie et al., "Weakly supervised object localization with soft guidance and channel erasing for auto labelling in autonomous driving systems," ISA Trans., vol. 132, pp. 39-51, 2023, doi:10.1016/j.isatra.2022.08.003.

P. H. Martins, Z. Marinho, and A. F. T. Martins, "Chunk-based nearest neighbor machine translation," in Proc. 2022 Conf. Empir. Methods Nat. Lang. Process. (EMNLP), 2022, pp. 4228-4245, doi:10.18653/v1/2022.emnlp-main.284.

Y. Zhang et al., "Monitoring depression trends on Twitter during the COVID-19 pandemic: Observational study," JMIR Infodemiology, vol. 1, no. 1, pp. 1-18, 2021, doi: 10.2196/26769.

T. Saravanan and N. Thillaiarasu, "Optimal grouping and belief based CH selection in mobile ad-hoc network using chunk reliable routing protocol," in Proc. Int. Conf. Adv. Comput. Innov. Technol. Eng. (ICACITE), 2021, pp. 933-940, doi:10.1109/icacite51222.2021.9404631.

A. H. Madkour, H. M. Abdelkader, and A. M. Mohammed, "Dynamic classification ensembles for handling imbalanced multiclass drifted data streams," Inf. Sci., vol. 670, 2024, Art. no. 120555, doi:10.1016/j.ins.2024.120555.

A. S. Palli et al., "An experimental analysis of drift detection methods on multi-class imbalanced data streams," Appl. Sci., vol. 12, no. 22, 2022, Art. no. 11688, doi: 10.3390/app122211688.

J. M. Rimmele, D. Poeppel, and O. Ghitza, "Acoustically driven cortical δ oscillations underpin prosodic chunking," eNeuro, vol. 8, no. 4, 2021, doi: 10.1523/eneuro.0562-20.2021.

F. P. Boogaard, "Improved point-cloud segmentation for plant phenotyping through class dependent sampling of training data to battle class imbalance," Front. Plant Sci., vol. 13, 2022, Art. no. 838190, doi: 10.3389/fpls.2022.838190.

T. Hu et al., "A two-dimensional entropy-based method for detecting the degree of segregation in asphalt mixture," Constr. Build. Mater., vol. 347, 2022, Art. no. 128450, doi:10.1016/j.conbuildmat.2022.128450.

R. Tachicart and K. Bouzoubaa, "Moroccan Arabic vocabulary generation using a rule-based approach," J. King Saud Univ.-Comput. Inf. Sci., vol. 34, no. 10, pp. 8538-8548, 2022, doi:10.1016/j.jksuci.2021.02.013.

T. Nguyen and T. Nguyen, "Rule-based machine translation for the automatic translation of Vietnamese sign language," Int. J. Lang. Linguist., vol. 11, no. 6, pp. 191-198, 2023, doi:10.11648/j.ijll.20231106.12.

D. Kouremenos, K. Ntalianis, and S. Kollias, "A novel rule based machine translation scheme from Greek to Greek sign language: Production of different types of large corpora and language models evaluation," Comput. Speech Lang., vol. 51, pp. 110-135, 2018, doi:10.1016/j.csl.2018.04.001.

N. Sethi et al., "A pragmatic analysis of machine translation techniques for preserving the authenticity of the Sanskrit language," ACM Trans. Asian Low-Resour. Lang. Inf. Process., Jul. 2023, doi:10.1145/3610582.

A. V. Hujon, T. D. Singh, and K. Amitab, "Neural machine translation systems for English to Khasi: A case study of an Austroasiatic language," Expert Syst. Appl., vol. 238, 2024, Art. no. 121813, doi:10.1016/j.eswa.2023.121813.

F. H. Rachman et al., "ModifiedECS (mECS) algorithm for Madurese-Indonesian rule-based machine translation," in Proc. Int. Conf. Sci. Inf. Technol. Smart Admin. (ICSINTESA), 2022, pp. 51-56, doi:10.1109/icsintesa56431.2022.10041470.

J. Peng et al., "A fine-grained modal label-based multi-stage network for multimodal sentiment analysis," Expert Syst. Appl., vol. 221, 2023, Art. no. 119721, doi: 10.1016/j.eswa.2023.119721.

A. Roy and E. Cambria, "Soft labeling constraint for generalizing from sentiments in single domain," Knowl.-Based Syst., vol. 245, 2022, Art. no. 108346, doi: 10.1016/j.knosys.2022.108346.

A. Ayub, F. Lumban, A. Boediman, and W. Budiharto, "Airline reviews processing: Abstractive summarization and rating-based sentiment classification using deep transfer learning," Int. J. Inf. Manag. Data Insights, vol. 4, no. 2, 2024, Art. no. 100238, doi:10.1016/j.jjimei.2024.100238.

Q. Qin et al., "Sentiment and attention of the Chinese public toward electric vehicles: A big data analytics approach," Eng. Appl. Artif. Intell., vol. 127, 2024, Art. no. 107216, doi:10.1016/j.engappai.2023.107216.

H. Yu et al., "Identifying causal effects of the clinical sentiment of patients' nursing notes on anticipated fall risk stratification," Inf. Process. Manag., vol. 60, no. 6, 2023, Art. no. 103481, doi:10.1016/j.ipm.2023.103481.

A. Aslam, A. B. Sargano, and Z. Habib, "Attention-based multimodal sentiment analysis and emotion recognition using deep neural networks," Appl. Soft Comput., vol. 144, 2023, Art. no. 110494, doi:10.1016/j.asoc.2023.110494.

J. Shi et al., "Syntax-enhanced aspect-based sentiment analysis with multi-layer attention," Neurocomputing, vol. 557, 2023, Art. no. 126730, doi: 10.1016/j.neucom.2023.126730.

Y. Song, H. Ni, and X. Zhu, "Analytical modeling of optimal chunk size for efficient transmission in information-centric networking," Int. J. Innov. Comput., Inf. Control, vol. 16, no. 5, pp. 1511-1525, Oct. 2020.

A. Shukla and R. Lalengmawia, "Research output analysis of science and technology faculty members of Mizoram University," Library Waves, vol. 4, no. 2, pp. 90-105, 2018, doi: 10.2991/msc-18.2018.22.

A. V. Hujon, T. D. Singh, and K. Amitab, "Transfer learning based neural machine translation of English-Khasi on low-resource settings," Procedia Comput. Sci., vol. 218, pp. 1-8, 2023, doi:10.1016/j.procs.2022.12.396.

M. S. Al Fajri, H. A. Rahim, and K. Rajandran, "A corpus-assisted discourse study on the construction of 'obesity' in Indonesian news media," Stud. Engl. Lang. Educ., vol. 10, no. 3, pp. 1467-1484, 2023, doi: 10.24815/siele.v10i3.28822.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).