Comparison of Classification Algorithm and Language Model in Accounting Financial Transaction Record: A Natural Language Processing Approach

Bagas Adi Makayasa (1), Maria Ulfah Siregar (2), Bambang Sugiantoro (3), Agung Fatwanto (4)
(1) Department of Informatics, Faculty of Science and Technology, Universitas Islam Negeri Sunan Kalijaga Yogyakarta, 55281, Indonesia
(2) Department of Informatics, Faculty of Science and Technology, Universitas Islam Negeri Sunan Kalijaga Yogyakarta, 55281, Indonesia
(3) Department of Informatics, Faculty of Science and Technology, Universitas Islam Negeri Sunan Kalijaga Yogyakarta, 55281, Indonesia
(4) Department of Informatics, Faculty of Science and Technology, Universitas Islam Negeri Sunan Kalijaga Yogyakarta, 55281, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Makayasa, Bagas Adi, et al. “Comparison of Classification Algorithm and Language Model in Accounting Financial Transaction Record: A Natural Language Processing Approach”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 3, June 2024, pp. 880-6, doi:10.18517/ijaseit.14.3.19179.
The problem of financial recording not following the principles of accounting science has the potential to cause unnecessary problems. However, micro, small, and medium enterprises with their distinctive characteristics, though not all, still face many obstacles in writing financial reports. Even though there is already much financial software available, our study aims to investigate opportunities for implementing automation of accounting financial transaction records using the NLP approach, to interpret financial transactions based on text written on the transaction form into accounting journals (debits and credits). Experiments were carried out by comparing the performance of three classification algorithms, namely SVM, K-Nearest Neighbor, and Random Forest, with traditional (TF-IDF and BOW) and contextual (Word2Vec) Language Models. There are 200 financial transaction datasets consisting of ten classes. The data is divided into two parts, namely, the balance dataset and the imbalance dataset. The pair SVM and Word2Vec in the balanced dataset gave the highest accuracy (92.5%), precision (92.5%), recall/sensitivity (93.33%), and F1 score (92%). However, compared with the results of related semantic research (the average performance reaches 95%), the results obtained in this study are still lower. One point that may have a significant effect is the amount of data in the corpus, which is still lacking. Researchers suggest increasing the number of datasets and using a combination of other language models such as Glove, Bert etc. This study can also be used as a model for more complex financial transaction cases in future research.

Kurnia Rahayu, S., Budiarti, I., Waluya Firdauas, D., & Onegina, V, "Digitalization and informal MSME: Digital financial inclusion for MSME development in the formal economy", Journal of Eastern European and Central Asian Research (JEECAR), 10(1), 9-19, 2022, doi: 10.15549/jeecar.v10i1.1056.

Harahap, S. S., Halim, A., & Prayoga, Y, "The Role Of Financial Statements On Increasing Income In SMEs", International Journal Of Community Service, 2(2), 157–164, 2022, doi: 10.51601/ijcs.v2i2.80.

D. Simanjuntak, S. Nurjanah, Willy, and I. Muda, "Historical cost vs current cost accounting method", Braz. J. Develop., vol. 9, no. 12, pp. 31828–31840, Dec. 2023, doi: 10.34117/bjdv9n12-085.

Nurjannah, D., Wardhana, E. T. D. R. W., Handayati, P., Winarno, A., & Jihadi, M, "The Influence of Managerial Capabilities, Financial Literacy, and Risk Mitigation On Msmes Business Sustainability". Journal of Law and Sustainable Development, 11(4), e520, 2023, doi: 10.55908/sdgs.v11i4.520.

Al Hashfi, R., Zusryn, A., Khoirunnisa, N., & Listyowati, A, "Online Payment: Individual Characteristics and Digital Financial Inclusion in OIC Countries", Journal of Islamic Monetary Economics and Finance, 6(4), 767 – 788, 2020, doi: 10.21098/jimf.v6i4.1148.

S. Salloum, T. Gaber, S. Vadera and K. Shaalan, "A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques," in IEEE Access, vol. 10, pp. 65703-65727, 2022, doi: 10.1109/ACCESS.2022.3183083.

C. Mugisha and I. Paik, "Comparison of Neural Language Modeling Pipelines for Outcome Prediction from Unstructured Medical Text Notes," in IEEE Access, vol. 10, pp. 16489-16498, 2022, doi: 10.1109/ACCESS.2022.3148279.

S. Amin et al., "Recurrent Neural Networks With TF-IDF Embedding Technique for Detection and Classification in Tweets of Dengue Disease," in IEEE Access, vol. 8, pp. 131522-131533, 2020, doi: 10.1109/ACCESS.2020.3009058.

M. Jayaratne and B. Jayatilleke, "Predicting Personality Using Answers to Open-Ended Interview Questions," in IEEE Access, vol. 8, pp. 115345-115355, 2020, doi: 10.1109/ACCESS.2020.3004002.

Iswandi, Irvan, et al. "Penelitian Awal: Otomatisasi Interpretasi Data Akuntansi Berbasis Natural Language Processing." Sriwijaya Journal of Information Systems, vol. 5, no. 2, Oct. 2013.

G. G. Jayasurya, S. Kumar, B. K. Singh and V. Kumar, "Analysis of Public Sentiment on COVID-19 Vaccination Using Twitter," in IEEE Transactions on Computational Social Systems, vol. 9, no. 4, pp. 1101-1111, Aug. 2022, doi: 10.1109/TCSS.2021.3122439.

S. D. A. Bujang et al., "Multiclass Prediction Model for Student Grade Prediction Using Machine Learning," in IEEE Access, vol. 9, pp. 95608-95621, 2021, doi: 10.1109/ACCESS.2021.3093563.

M. U. Siregar, I. Setiawan, N. Z. Akmal, D. Wardani, Y. Yunitasari and A. Wijayanto, "Optimized Random Forest Classifier Based on Genetic Algorithm for Heart Failure Prediction", Seventh International Conference on Informatics and Computing (ICIC), Denpasar, Bali, Indonesia, 2022, pp. 01-06, 2022, doi: 10.1109/ICIC56845.2022.10006987.

Kusuma, H., Muafi, M. and Kholid, M.N, "Pro-Environmental MSMES Performance: The Role of Green it Adoption, Green Innovative Behavior, and Financial Accounting Resources", Journal of Law and Sustainable Development. 11, vol 4 (Aug. 2023), e673, 2023, doi: 10.55908/sdgs.v11i4.673.

M. F. Mridha, A. A. Lima, K. Nur, S. C. Das, M. Hasan and M. M. Kabir, "A Survey of Automatic Text Summarization: Progress, Process and Challenges," in IEEE Access, vol. 9, pp. 156043-156070, 2021, doi: 10.1109/ACCESS.2021.3129786.

X. Chen, P. Cong and S. Lv, "A Long-Text Classification Method of Chinese News Based on BERT and CNN," in IEEE Access, vol. 10, pp. 34046-34057, 2022, doi: 10.1109/ACCESS.2022.3162614.

J. Jiang et al., "Enhancements of Attention-Based Bidirectional LSTM for Hybrid Automatic Text Summarization," in IEEE Access, vol. 9, pp. 123660-123671, 2021, doi: 10.1109/ACCESS.2021.3110143.

R. Devika, S. Vairavasundaram, C. S. J. Mahenthar, V. Varadarajan and K. Kotecha, "A Deep Learning Model Based on BERT and Sentence Transformer for Semantic Keyphrase Extraction on Big Social Data," in IEEE Access, vol. 9, pp. 165252-165261, 2021, doi: 10.1109/ACCESS.2021.3133651.

H. S. Nawaz, Z. Shi, Y. Gan, A. Hirpa, J. Dong and H. Zheng, "Temporal Moment Localization via Natural Language by Utilizing Video Question Answers as a Special Variant and Bypassing NLP for Corpora," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6174-6185, Sept. 2022, doi: 10.1109/TCSVT.2022.3162650.

A. Radhakrishnan, D. Mahapatra and A. James, "Consumer Document Analytical Accelerator Hardware," in IEEE Access, vol. 11, pp. 5161-5167, 2023, doi: 10.1109/ACCESS.2023.3237463.

H. A. Ahmed, N. Z. Bawany and J. A. Shamsi, "CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms," in IEEE Access, vol. 9, pp. 50496-50512, 2021, doi: 10.1109/ACCESS.2021.3069248.

R. Sonbol, G. Rebdawi and N. Ghneim, "The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review," in IEEE Access, vol. 10, pp. 62811-62830, 2022, doi: 10.1109/ACCESS.2022.3182372.

A. A. Wazrah and S. Alhumoud, "Sentiment Analysis Using Stacked Gated Recurrent Unit for Arabic Tweets," in IEEE Access, vol. 9, pp. 137176-137187, 2021, doi: 10.1109/ACCESS.2021.3114313..

S. Lyu, X. Tian, Y. Li, B. Jiang and H. Chen, "Multiclass Probabilistic Classification Vector Machine," in IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3906-3919, Oct. 2020, doi: 10.1109/TNNLS.2019.2947309.

S. Sugriyono, and M. U. Siregar, "Prapemrosesan Klasifikasi Algoritme kNN Menggunakan K-means dan Matriks Jarak untuk Dataset Hasil Studi mahasiswa", Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 4, pp. 311-316, Oct. 2020, doi: 10.14710/jtsiskom.2020.13874

Husni, F. H. Rachman, I. O. Suzanti and M. K. Sari, "Word Ambiguity Identification using POS Tagging in Automatic Essay Scoring," 2022 IEEE 8th Information Technology International Seminar (ITIS), Surabaya, Indonesia, 2022, pp. 140-144, doi: 10.1109/ITIS57155.2022.10009034..

M. Khushi et al., "A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data," in IEEE Access, vol. 9, pp. 109960-109975, 2021, doi: 10.1109/ACCESS.2021.3102399.

I. Ashrafi et al., "Banner: A Cost-Sensitive Contextualized Model for Bangla Named Entity Recognition," in IEEE Access, vol. 8, pp. 58206-58226, 2020, doi: 10.1109/ACCESS.2020.2982427.

S. Singh and A. Mahmood, "The NLP Cookbook: Modern Recipes for Transformer Based Deep Learning Architectures," in IEEE Access, vol. 9, pp. 68675-68702, 2021, doi: 10.1109/ACCESS.2021.3077350.

J. M. Pérez et al., "Assessing the Impact of Contextual Information in Hate Speech Detection," in IEEE Access, vol. 11, pp. 30575-30590, 2023, doi: 10.1109/ACCESS.2023.3258973.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).