Customer Needs Classification from Online Social Media Using Bag-of-Concepts Representation

Kanjana Laosen (1), Adisak Intana (2), Phisitchai Chuaynukul (3)
(1) Andaman Intelligent Tourism and Service Informatics Center, College of Computing, Prince of Songkla University, Phuket, 83120, Thailand
(2) Andaman Intelligent Tourism and Service Informatics Center, College of Computing, Prince of Songkla University, Phuket, 83120, Thailand
(3) Andaman Intelligent Tourism and Service Informatics Center, College of Computing, Prince of Songkla University, Phuket, 83120, Thailand
Fulltext View | Download
How to cite (IJASEIT) :
Laosen, Kanjana, et al. “Customer Needs Classification from Online Social Media Using Bag-of-Concepts Representation”. International Journal on Advanced Science, Engineering and Information Technology, vol. 13, no. 4, Aug. 2023, pp. 1546-53, doi:10.18517/ijaseit.13.4.17369.
Social media platforms are now a very powerful tool for digital marketing strategy because it helps companies to be in direct contact with their customers. A communication problem in digital social media is several customer needs phrases posted on social media, making it difficult for businesses to find relevant posts and respond to customers immediately. Therefore, knowing and understanding the customer requirements for a product can help the product owner to propose the right product to the right customer. This study focuses on understanding customer needs in Thai and classifying them into certain concepts. This study aims to classify customer needs for products in online social media community groups. The model focuses on understanding Thai customer need phrases. We then use a bag of concepts representation, including pattern analysis that applies n-grams together with POS and synonym replacement, conceptual analysis, pattern matching, and class labeling that applies concept sets obtained from the FP-Growth algorithm and represents TD-IDF value in a bag of concepts. The effectiveness of the proposed model is evaluated on five classification algorithms, including Decision Tree, Support Vector Machine, Naí¯ve-Bayes, K-Nearest Neighbor, and RBF Neural Network. The results show that Decision Tree can yield higher accuracy and F-measure values than the others. As this study is an initial step of a personalized product recommendation system in the future, this study will apply the model to the remaining domains for future work.

S.-H. Liao, R. Widowati, and Y.-C. Hsieh, “Investigating online social media users’ behaviors for social commerce recommendations,” Technol. Soc., vol. 66, p. 101655, 2021, doi:

T. Hou, B. Yannou, Y. Leroy, and E. Poirson, “Mining customer product reviews for product development: a summarization process,” Expert Syst. Appl., vol. 132, pp. 141-150, 2019, doi:

B. Jeong, J. Yoon, and J.-M. Lee, “Social media mining for product planning: A product opportunity mining approach based on topic modeling and sentiment analysis,” Int. J. Inf. Manage., vol. 48, pp. 280-290, 2019, doi:

Y. K. Dwivedi et al., “Setting the future of digital and social media marketing research: perspectives and research propositions,” Int. J. Inf. Manage., vol. 59, p. 102168, 2021, doi:

F. Li, J. Larimo, and L. C. Leonidou, “Social media marketing strategy: definition, conceptualization, taxonomy, validation, and future agenda,” J. Acad. Mark. Sci., vol. 49, no. 1, pp. 51-70, 2021, doi: 10.1007/s11747-020-00733-3.

S. Qiu, L. Wu, Y. Yang, and G. Zeng, “Offering the right incentive at the right time: leveraging customer mental accounting to promote prepaid service,” Ann. Tour. Res., vol. 93, p. 103367, 2022, doi:

O. S. Itani, A. Kalra, and J. Riley, “Complementary effects of CRM and social media on customer co-creation and sales performance in B2B firms: The role of salesperson self-determination needs,” Inf. Manag., vol. 59, no. 3, p. 103621, 2022, doi:

S. Basu, “Personalized product recommendations and firm performance,” Electron. Commer. Res. Appl., vol. 48, p. 101074, 2021, doi:

L. Lowphansirikul, C. Polpanumas, A. T. Rutherford, and S. Nutanong, “A large English-Thai parallel corpus from the web and machine-generated text,” Lang. Resour. Eval., 2021, doi: 10.1007/s10579-021-09536-6.

J. Pan, M. Yan, E. M. Richter, H. Shu, and R. Kliegl, “The Beijing sentence corpus: a Chinese sentence corpus with eye movement data and predictability norms,” Behav. Res. Methods, 2021, doi: 10.3758/s13428-021-01730-2.

S. Li, Y. Wang, Z. Lan, X. Yuan, L. Zhang, and G. Yan, “Effects of word spacing on children’s reading: evidence from eye movements,” Read. Writ., vol. 35, no. 4, pp. 1019-1033, 2022, doi: 10.1007/s11145-021-10215-9.

K. Paripremkul and O. Sornil, “Segmenting words in Thai language using minimum text units and conditional random field,” J. Adv. Inf. Technol., vol. 12, no. 2, pp. 135-141, 2021.

C. Saetia, E. Chuangsuwanich, T. Chalothorn, and P. Vateekul, “Semi-supervised Thai sentence segmentation using local and distant word representations,” arXiv Prepr. arXiv1908.01294, 2019.

National Electronics and Computer Technology Center, “Thai lexeme tokenizer:LexTo.” .

C. Haruechaiyasak and S. Kongyoung, “TLex: Thai lexeme analyser based on the conditional random fields,” Proc. Int. Symp. Nat. Lang. Process., 2009.

National Electronics and Computer Technology Center, “TLex.” .

National Electronics and Computer Technology Center, “Thai lexeme tokenizer: lexitron dictionary.” .

M. Sahlgren and R. Cí¶ster, “Using bag-of-concepts to improve the performance of support vector machines in text categorization,” in Proceedings of the 20th International Conference on Computational Linguistics, 2004, p. 487, doi: 10.3115/1220355.1220425.

P. Li, K. Mao, Y. Xu, Q. Li, and J. Zhang, “Bag-of-concepts representation for document classification based on automatic knowledge acquisition from probabilistic knowledge base,” Know.-Based Syst., vol. 193, no. C, Apr. 2020, doi: 10.1016/j.knosys.2019.105436.

W. Shalaby and W. Zadrozny, “Learning concept embeddings for dataless classification via efficient bag-of-concepts densification,” Knowl. Inf. Syst., vol. 61, 2019, doi: 10.1007/s10115-018-1321-8.

G. Salton and C. Buckley, “Term-weighting approaches in automatic text retrieval,” Inf. Process. Manag., vol. 24, no. 5, pp. 513-523, Jan. 1988, doi: 10.1016/0306-4573(88)90021-0.

M. Mujahid et al., “Sentiment analysis and topic modeling on Tweets about online education during covid-19,” Appl. Sci., vol. 11, no. 18, p. 8438, 2021, doi:

Z. Yang, N. Garcia, C. Chu, M. Otani, Y. Nakashima, and H. Takemura, “A comparative study of language transformers for video question answering,” Neurocomputing, vol. 445, pp. 121-133, 2021, doi:

P. K. Jain, W. Quamer, V. Saravanan, and R. Pamula, “Employing BERT-DCNN with sentic knowledge base for social media sentiment analysis,” J. Ambient Intell. Humaniz. Comput., 2022, doi: 10.1007/s12652-022-03698-z.

A. Mahmoud and M. Zrigui, “Semantic similarity analysis for corpus development and paraphrase detection in Arabic,” Int. Arab J. Inf. Technol., vol. 18, pp. 1-7, 2020, doi: 10.34028/iajit/18/1/1.

A. Jalilifard, V. F. Carid’a, A. F. Mansano, and R. Cristo, “Semantic sensitive TF-IDF to determine word relevance in documents,” ArXiv, vol. abs/2001.0, 2021.

J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” SIGMOD Rec., vol. 29, no. 2, pp. 1-12, May 2000, doi: 10.1145/335191.335372.

S. Barbon Junior et al., “Sport action mining: dribbling recognition in soccer,” Multimed. Tools Appl., vol. 81, no. 3, pp. 4341-4364, 2022, doi: 10.1007/s11042-021-11784-1.

J. R. Quinlan, “Induction of decision trees,” Mach. Learn., vol. 1, no. 1, pp. 81-106, 1986, doi: 10.1007/BF00116251.

F. M. Shamrat, R. Ranjan, K. Hasib, A. Yadav, and A. Siddique, “Performance evaluation among ID3, C4.5, and CART decision tree algorithm,” 2022, pp. 127-142.

O. Abualghanam, S. Al-Khatib, and M. Hiari, “Data mining model for predicting customer purchase behavior in e-commerce context,” Int. J. Adv. Comput. Sci. Appl., vol. 13, p. 421, 2022, doi: 10.14569/IJACSA.2022.0130249.

B. F. Tanyu, A. Abbaspour, Y. Alimohammadlou, and G. Tecuci, “Landslide susceptibility analyses using random rorest, C4.5, and C5.0 with balanced and unbalanced datasets,” CATENA, vol. 203, p. 105355, 2021, doi:

J. Santoso, N. Ginantra, M. Arifin, R. Riinawati, D. Sudrajat, and R. Rahim, “Comparison of classification data mining C4.5 and naí¯ve bayes algorithms of EDM dataset,” TEM J., vol. 10, pp. 1738-1744, 2021, doi: 10.18421/TEM104-34.

F. Es-sabery, K. Es-sabery, H. Garmani, and A. Hair, “Sentiment analysis of covid19 tweets using A mapReduce fuzzified hybrid classifier based on C4.5 decision tree and convolutional neural network,” E3S Web of Conferences, vol. 297. EDP Sciences, Les Ulis, 2021, doi:

C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273-297, 1995, doi: 10.1007/BF00994018.

B. AlBadani, R. Shi, and J. Dong, “A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM,” Appl. Syst. Innov., vol. 5, no. 1, 2022, doi: 10.3390/asi5010013.

T. H. Jaya Hidayat, Y. Ruldeviyani, A. R. Aditama, G. R. Madya, A. W. Nugraha, and M. W. Adisaputra, “Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier,” Procedia Comput. Sci., vol. 197, pp. 660-667, 2022, doi:

B. Paul, S. Guchhait, T. Dey, D. Das Adhikary, and S. Bera, “A Comparative study on sentiment analysis influencing word embedding using SVM and KNN,” in Cyber Intelligence and Information Retrieval, Springer, 2022, pp. 199-211.

H. Kaur, S. U. Ahsaan, B. Alankar, and V. Chang, “A proposed sentiment analysis deep learning algorithm for analyzing covid-19 tweets,” Inf. Syst. Front., vol. 23, no. 6, pp. 1417-1429, 2021, doi: 10.1007/s10796-021-10135-7.

R. Vidhya and G. Vadivu, “Towards developing an ensemble based two-level student classification model (ESCM) using advanced learning patterns and analytics,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 7, pp. 7095-7105, Jul. 2021, doi: 10.1007/s12652-020-02375-3.

N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Mach. Learn., vol. 29, no. 2, pp. 131-163, 1997, doi: 10.1023/A:1007465528199.

J. Ji, H. Wang, S. Song, and J. Pi, “Sentiment analysis of comments of wooden furniture based on naive Bayesian model,” Prog. Artif. Intell., vol. 10, no. 1, pp. 23-35, 2021, doi: 10.1007/s13748-020-00221-3.

J. Gautam, M. Atrey, N. Malsa, A. Balyan, R. Shaw, and A. Ghosh, “Twitter data sentiment analysis using naive bayes classifier and generation of heat map for analyzing intensity geographically,” 2021, pp. 129-139.

V. V., J. B. Cooper, and R. L. J., “Algorithm Inspection for Chatbot Performance Evaluation,” Procedia Comput. Sci., vol. 171, pp. 2267-2274, 2020, doi:

R. R. Sethuraman and J. S. K. Athisayam, “An Improved Feature Selection Based on Naive Bayes with Kernel Density Estimator for Opinion Mining,” Arab. J. Sci. Eng., vol. 46, no. 4, pp. 4059-4071, 2021, doi: 10.1007/s13369-021-05381-5.

R. S. Kumar, A. F. Saviour Devaraj, M. Rajeswari, E. G. Julie, Y. H. Robinson, and V. Shanmuganathan, “Exploration of sentiment analysis and legitimate artistry for opinion mining,” Multimed. Tools Appl., vol. 81, no. 9, pp. 11989-12004, 2022, doi: 10.1007/s11042-020-10480-w.

T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21-27, 1967, doi: 10.1109/TIT.1967.1053964.

M. Gayathri and R. J. Kannan, “Ontology based concept extraction and classification of ayurvedic documents,” Procedia Comput. Sci., vol. 172, pp. 511-516, 2020, doi:

T. Anwar and V. Uma, “Comparative study of recommender system approaches and movie recommendation using collaborative filtering,” Int. J. Syst. Assur. Eng. Manag., vol. 12, no. 3, pp. 426-436, 2021, doi: 10.1007/s13198-021-01087-x.

Z.-R. He, Y.-T. Lin, C.-Y. Wu, Y.-J. You, and S.-J. Lee, “Pattern classification based on RBF Networks with self-constructing clustering and hybrid learning,” Appl. Sci., vol. 10, no. 17, 2020, doi: 10.3390/app10175886.

V. Sornlertlamvanich, T. Charoenporn, and H. Isahara, “Language resource management system for asian wordnet collaboration and its web service application.,” 2010.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).