Categorization of Malay Social Media Text and Normalization of Spelling Variations and Vowel-less Words
How to cite (IJASEIT) :
E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in sentiment analysis,” Procedia Comput. Sci., vol. 17, pp. 26-32, 2013.
L. Derczynski, D. Maynard, N. Aswani, and K. Bontcheva, “Microblog-genre noise and impact on semantic annotation accuracy,” in Proceedings of the 24th ACM Conference on Hypertext and Social Media, 2013, pp. 21-30.
“Malay Language,” Encyclopedia Britannica. [Online]. Available: https://www.britannica.com/topic/Malay-language.
Statista, “Number of Facebook users in Malaysia from 2017 to 2023.” [Online]. Available: https://www.statista.com/statistics/490484/number-of-malaysia-facebook-users/ .
N. Elgendy and A. Elragal, “Big data analytics: a literature review paper,” in Industrial Conference on Data Mining, 2014, pp. 214-227.
R. Kitchin, “The real-time city? Big data and smart urbanism,” GeoJournal, vol. 79, no. 1, pp. 1-14, 2014.
X. Hu and H. Liu, “Text analytics in social media,” in Mining text data, Springer, 2012, pp. 385-414.
N. N. Yusof, A. Mohamed, and S. Abdul-Rahman, “Reviewing classification approaches in sentiment analysis,” in International conference on soft computing in data science, 2015, pp. 43-53.
S. Abdul-Rahman, A. A. Bakar, and Z.-A. Mohamed-Hussein, “An intelligent data pre-processing of complex datasets,” Intell. Data Anal., vol. 16, no. 2, pp. 305-325, 2012.
S. B. Rodzman, M. F. I. A. Ronie, N. K. Ismail, N. A. Rahman, F. Ahmad, and Z. M. Nor, “Analyzing Malay Stemmer Performance Towards Fuzzy Logic Ranking Function on Malay Text Corpus,” in 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP), 2018, pp. 1-6.
I. Balazevic, M. Braun, and K.-R. Mí¼ller, “Language Detection For Short Text Messages In Social Media,” arXiv Prepr. arXiv1608.08515, 2016.
M. Lui and T. Baldwin, “Accurate language identification of twitter messages,” in Proceedings of the 5th workshop on language analysis for social media (LASM), 2014, pp. 17-25.
“Loanword,” Lexico. [Online]. Available: https://en.oxforddictionaries.com/definition/loanword.
S. B. Basri, R. Alfred, and C. K. On, “Automatic spell checker for Malay blog,” in 2012 IEEE International Conference on Control System, Computing and Engineering, 2012, pp. 506-510.
N. Samsudin, M. Puteh, A. R. Hamdan, and M. Z. A. Nazri, “Normalization of noisy texts in Malaysian online reviews,” J. ICT, vol. 12, pp. 147-159, 2013.
M. A. Saloot, N. Idris, and A. Aw, “Noisy text normalization using an enhanced language model,” in Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, 2014, pp. 111-122.
N. A. B. Muhamad, N. Idris, and M. A. Saloot, “Proposal: A Hybrid Dictionary Modelling Approach for Malay Tweet Normalization,” in Journal of Physics: Conference Series, 2017, vol. 806, no. 1, p. 12008.
M. A. Saloot, N. Idris, and R. Mahmud, “An architecture for Malay Tweet normalization,” Inf. Process. Manag., vol. 50, no. 5, pp. 621-633, 2014.
“Panduan singkatan khidmat pesanan ringkas,” Dewan Bahasa dan Pustaka. [Online]. Available: http://www.dbp.gov.my/khidmatsms.pdf.
R.-M. Bali and N. P. Kuan, “Language Identifier for Bahasa Malaysia and Bahasa Indonesia.”
J. Williams and C. Dagli, “Twitter language identification of similar languages and dialects without ground truth,” in Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial), 2017, pp. 73-83.
M. Puteh, N. Isa, S. Puteh, and N. A. Redzuan, “Sentiment mining of Malay newspaper (SAMNews) using artificial immune system,” in Proceedings of the World Congress on Engineering, 2013, vol. 3, pp. 1498-1503.
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios, “Duplicate record detection: A survey,” IEEE Trans. Knowl. Data Eng., vol. 19, no. 1, pp. 1-16, 2006.
A. Tversky, “Features of similarity.,” Psychol. Rev., vol. 84, no. 4, p. 327, 1977.
L. Yujian and L. Bo, “A normalized Levenshtein distance metric,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 6, pp. 1091-1095, 2007.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).