X Bot Detection Using One-Class Classification Methods with Isolation Forest Algorithm

Yusup Miftahuddin (1), Muhammad Haydar Al-Ghifary (2)
(1) Department of Informatics, Bandung National Institute of Technology (Itenas), Jl. Phh. Mustofa No. 23, Bandung, Indonesia
(2) Department of Informatics, Bandung National Institute of Technology (Itenas), Jl. Phh. Mustofa No. 23, Bandung, Indonesia
Fulltext View | Download
How to cite (IJASEIT) :
Miftahuddin, Yusup, and Muhammad Haydar Al-Ghifary. “X Bot Detection Using One-Class Classification Methods With Isolation Forest Algorithm”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 4, Aug. 2024, pp. 1233-9, doi:10.18517/ijaseit.14.4.19364.
X bots pose a significant issue in the social media landscape, with many shared links originating from bot-like accounts. This study introduces the application of the Isolation Forest algorithm, aimed explicitly at identifying anomalies such as bots by analyzing X account details. This study utilizes a dataset that merges data from Botometer with supplementary metrics like ‘average tweets per day’ and ‘account age in days’, contributed by David Martín Gutiérrez. This approach was adopted due to the increasing difficulties accessing the X API. The dataset comprises 37,438 instances, with 25,013 labeled human accounts and 12,425 labeled bot accounts. Pre-processing is performed to remove irrelevant features, and the dataset is split into Training, Validation, and Test sets in a 70:15:15 ratio. The training set undergoes hyperparameter and threshold tuning to identify the best configuration for this specific dataset (n_estimators: 50, contamination: 0.5, bootstrap: True), achieving a training set F1-score of 0.211001. Despite these optimization efforts, the Isolation Forest model's performance remains relatively low. The Test set evaluation yields modest precision, recall, and F1-score values (0.1801, 0.2795, and 0.2190, respectively), with a ROC AUC score of 0.3272. While the Isolation Forest algorithm shows promise in detecting X bots, its performance on this specific dataset is limited. Isolation Forest may not be the most suitable algorithm for this particular bot detection task on this dataset. Future work will explore techniques to enhance the performance of bot detection for a more comprehensive analysis.

Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia, “Detecting Automation of X Accounts: Are You a Human, Bot, or Cyborg?,” IEEE Trans Dependable Secure Comput, vol. 9, no. 6, pp. 811–824, Nov. 2012, doi: 10.1109/TDSC.2012.75.

D. Dukic, D. Keca, and D. Stipic, “Are You Human? Detecting Bots on X Using BERT,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Oct. 2020, pp. 631–636. doi: 10.1109/DSAA49011.2020.00089.

J. Pizarro, “Profiling Bots and Fake News Spreaders at PAN’19 and PAN’20 : Bots and Gender Profiling 2019, Profiling Fake News Spreaders on X 2020,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, Oct. 2020, pp. 626–630. doi: 10.1109/DSAA49011.2020.00088.

K. Wani, A. Patil, S. Mukherjee, and S. Sarkar, “Malicious X Bot Detector,” in 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE), IEEE, Jan. 2021, pp. 1–6. doi: 10.1109/ICNTE51185.2021.9487674.

J. Arumugam, K. Lalitha, S. M. Supreetha, R. T. Shrinithi, and S. Tamilarasan, “Machine Learning For Detecting X Bot,” in 2022 Fifth International Conference on Computational Intelligence and Communication Technologies (CCICT), IEEE, Jul. 2022, pp. 278–282. doi: 10.1109/CCiCT56684.2022.00059.

S. Heron, “Technologies for spam detection,” Network Security, vol. 2009, no. 1, pp. 11–15, Jan. 2009, doi: 10.1016/S1353-4858(09)70007-8.

T. Bui and K. Potika, “X Bot Detection using Social Network Analysis,” in 2022 Fourth International Conference on Transdisciplinary AI (TransAI), IEEE, Sep. 2022, pp. 87–88. doi: 10.1109/TransAI54797.2022.00022.

T. Tyagi et al., “X Bot Detection using Machine Learning Models,” in 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, Jan. 2023, pp. 26–30. doi: 10.1109/Confluence56041.2023.10048796.

N. Narayan, “X Bot Detection using Machine Learning Algorithms,” in 2021 Fourth International Conference on Electrical, Computer and Communication Technologies (ICECCT), IEEE, Sep. 2021, pp. 1–4. doi: 10.1109/ICECCT52121.2021.9616841.

H. Shukla, N. Jagtap, and B. Patil, “Enhanced X bot detection using ensemble machine learning,” in 2021 6th International Conference on Inventive Computation Technologies (ICICT), IEEE, Jan. 2021, pp. 930–936. doi: 10.1109/ICICT50816.2021.9358734.

F. N. Pramitha, R. B. Hadiprakoso, N. Qomariasih, and Girinoto, “X Bot Account Detection Using Supervised Machine Learning,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Dec. 2021, pp. 379–383. doi: 10.1109/ISRITI54043.2021.9702789.

S. Barhate, R. Mangla, D. Panjwani, S. Gatkal, and F. Kazi, “X bot detection and their influence in hashtag manipulation,” in 2020 IEEE 17th India Council International Conference (INDICON), IEEE, Dec. 2020, pp. 1–7. doi: 10.1109/INDICON49873.2020.9342152.

K. Sujith, S. Chowdhury, A. Goyal, A. V. Hegde, and R. Srinath, “X Bot Detection and Ranking using Supervised Machine Learning Models,” in 2022 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), IEEE, Dec. 2022, pp. 1–6. doi: 10.1109/ICDSAAI55433.2022.10028860.

T. Wang, F. Wu, and R. O. Sinnott, “A Case Study in X Bot Identification: Are They Still a Problem?,” in 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), IEEE, Dec. 2020, pp. 1–8. doi: 10.1109/SNAMS52053.2020.9336537.

F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining, IEEE, Dec. 2008, pp. 413–422. doi: 10.1109/ICDM.2008.17.

E. Marcelli, T. Barbariol, V. Savarino, A. Beghi, and G. A. Susto, “A Revised Isolation Forest procedure for Anomaly Detection with High Number of Data Points,” in 2022 IEEE 23rd Latin American Test Symposium (LATS), IEEE, Sep. 2022, pp. 1–5. doi: 10.1109/LATS57337.2022.9936964.

C. Melquiades and F. B. de Lima Neto, “Isolation Forest-based semi-supervised Anomaly Detection of multiple classes,” in 2022 17th Iberian Conference on Information Systems and Technologies (CISTI), IEEE, Jun. 2022, pp. 1–6. doi: 10.23919/CISTI54924.2022.9820467.

M. Badurowicz, P. Karczmarek, and J. Montusiewicz, “Fuzzy Extensions of Isolation Forests for Road Anomaly Detection,” in 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), IEEE, Jul. 2021, pp. 1–6. doi: 10.1109/FUZZ45933.2021.9494469.

J. J. Michael and M. Thenmozhi, “Outlier detection in maize field using Isolation Forest: A one-class classifier,” in 2023 International Conference on Networking and Communications (ICNWC), IEEE, Apr. 2023, pp. 1–6. doi: 10.1109/ICNWC57852.2023.10127404.

A. Petkovski and V. Shehu, “Anomaly Detection on Univariate Sensing Time Series Data for Smart Aquaculture Using K-Means, Isolation Forest, and Local Outlier Factor,” in 2023 12th Mediterranean Conference on Embedded Computing (MECO), IEEE, Jun. 2023, pp. 1–5. doi: 10.1109/MECO58584.2023.10154991.

S. Hariri, M. C. Kind, and R. J. Brunner, “Extended Isolation Forest,” IEEE Trans Knowl Data Eng, vol. 33, no. 4, pp. 1479–1489, Apr. 2021, doi: 10.1109/TKDE.2019.2947676.

L. Zhang and L. Liu, “Data Anomaly Detection Based on Isolation Forest Algorithm,” in 2022 International Conference on Computation, Big-Data and Engineering (ICCBE), IEEE, May 2022, pp. 87–89. doi: 10.1109/ICCBE56101.2022.9888169.

Y. Hara, Y. Fukuyama, K. Murakami, T. Iizaka, and T. Matsui, “Fault Detection of Hydroelectric Generators using Isolation Forest,” in 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), IEEE, Sep. 2020, pp. 864–869. doi: 10.23919/SICE48898.2020.9240331.

Z. Yang et al., “User Log Anomaly Detection System Based on Isolation Forest,” in 2023 2nd International Joint Conference on Information and Communication Engineering (JCICE), IEEE, May 2023, pp. 79–84. doi: 10.1109/JCICE59059.2023.00025.

J. Rodríguez-Ruiz, J. I. Mata-Sánchez, R. Monroy, O. Loyola-González, and A. López-Cuevas, “A one-class classification approach for bot detection on X,” Comput Secur, vol. 91, p. 101715, Apr. 2020, doi: 10.1016/j.cose.2020.101715.

Y. Chabchoub, M. U. Togbe, A. Boly, and R. Chiky, “An In-Depth Study and Improvement of Isolation Forest,” IEEE Access, vol. 10, pp. 10219–10237, 2022, doi: 10.1109/ACCESS.2022.3144425.

S. Liu, Z. Ji, and Y. Wang, “Improving Anomaly Detection Fusion Method of Rotating Machinery Based on ANN and Isolation Forest,” in 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), IEEE, Jul. 2020, pp. 581–584. doi: 10.1109/CVIDL51233.2020.00-23.

J. Su and J. Li, “An Anomaly Detection Algorithm for Multi-dimensional Segmentation Plane Isolation Forest,” in 2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET), IEEE, Aug. 2022, pp. 89–93. doi: 10.1109/CCET55412.2022.9906369.

R. ELHadad, Y.-F. Tan, and W.-N. Tan, “Comparison of Enhanced Isolation Forest and Enhanced Local Outlier Factor in Anomalous Power Consumption Labelling,” in 2023 IEEE 3rd International Conference in Power Engineering Applications (ICPEA), IEEE, Mar. 2023, pp. 243–247. doi: 10.1109/ICPEA56918.2023.10093186.

P. Yu and L. Jia, “Wind Power Data Cleaning Based on Autoencoder-Isolation Forest,” in 2022 7th International Conference on Power and Renewable Energy (ICPRE), IEEE, Sep. 2022, pp. 803–808. doi: 10.1109/ICPRE55555.2022.9960342.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).