A Comprehensive Review of Machine Learning Approaches for Detecting Malicious Software

Liu Yuanming (1), Rodziah Latih (2)
(1) Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, 43600, Malaysia
(2) Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, 43600, Malaysia
Fulltext View | Download
How to cite (IJASEIT) :
Yuanming , Liu, and Rodziah Latih. “A Comprehensive Review of Machine Learning Approaches for Detecting Malicious Software”. International Journal on Advanced Science, Engineering and Information Technology, vol. 14, no. 3, June 2024, pp. 826-34, doi:10.18517/ijaseit.14.3.19993.
With the continuous development of technology, the types of malware and their variants continue to increase, which has become an enormous challenge to network security. These malware use a variety of technical means to deceive or evade traditional detection methods, making traditional signature-based rule-based malware identification methods no longer applicable. Many machine algorithms have attracted widespread academic attention as powerful malware detection and classification methods in recent years. After an in-depth study of rich literature and a comprehensive survey of the latest scientific research results, feature extraction is used as the basis for classification. By extracting meaningful features from malware samples, such as behavioral patterns, code structures, and file attributes, researchers can discern unique characteristics that distinguish malicious software from benign ones. This process is the foundation for developing effective detection models and understanding the underlying mechanisms of malware behavior. We divide feature engineering and learning-based methods into two categories for investigation. Feature engineering involves selecting and extracting relevant features from raw data, while learning-based methods leverage machine learning algorithms to analyze and classify malware based on these features. Supervised, unsupervised, and deep learning techniques have shown promise in accurately detecting and classifying malware, even in the face of evolving threats. On this basis, we further look into the current problems and challenges malware identification research faces.

J. Acharya, A. Chuadhary, A. Chhabria, and S. Jangale, “Detecting malware, malicious URLs and virus using machine learning and signature matching,” in 2021 2nd International Conference for Emerging Technology, INCET 2021, 2021. doi:10.1109/INCET51464.2021.9456440.

U. Garg, N. Sharma, M. Kumar and A. Singh, "Identification and Detection of Behavior Based Malware using Machine Learning," 2023 International Conference on Artificial Intelligence and Smart Communication (AISC), Greater Noida, India, 2023, pp. 915-918, doi:10.1109/AISC56616.2023.10085168.

Srastika, N. Bhandary, R. S. Shalakha, P. Honnavalli, and E. Sivaraman, “An Enhanced Malware Detection Approach using Machine Learning and Feature Selection,” in 3rd International Conference on Electronics and Sustainable Communication Systems, ICESC 2022 - Proceedings, 2022. doi:10.1109/ICESC54411.2022.9885509.

Q. Qiao, R. Feng, S. Chen, F. Zhang, and X. Li, “Multi-label Classification for Android Malware Based on Active Learning,” IEEE Trans Dependable Secure Comput, 2022, doi:10.1109/TDSC.2022.3213689.

T. Lu and J. Wang, “DOMR: Toward Deep Open-World Malware Recognition,” IEEE Transactions on Information Forensics and Security, vol. 19, 2024, doi: 10.1109/TIFS.2023.3338469.

M. A. Halim, A. Abdullah, and K. A. Z. Ariffin, “Recurrent neural network for malware detection,” International Journal of Advances in Soft Computing and its Applications, vol. 11, no. 1, 2019.

P. Borana, V. Sihag, G. Choudhary, M. Vardhan, and P. Singh, “An assistive tool for fileless malware detection,” in World Automation Congress Proceedings, 2021. doi:10.23919/WAC50355.2021.9559449.

G. Popoiu, “One side class SVM training methods for malware detection,” in Proceedings - 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2022, 2022. doi: 10.1109/SYNASC57785.2022.00065.

M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq, “PE-miner: Mining structural information to detect malicious executables in realtime,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009. doi: 10.1007/978-3-642-04342-0_7.

S. Tyagi, A. Baghela, K. M. Dar, A. Patel, S. Kothari, and S. Bhosale, “Malware Detection in PE files using Machine Learning,” in 2022 OPJU International Technology Conference on Emerging Technologies for Sustainable Development, OTCON 2022, 2023. doi:10.1109/OTCON56053.2023.10113998.

S. Gulmez and I. Sogukpinar, “Graph-Based Malware Detection Using Opcode Sequences,” in 9th International Symposium on Digital Forensics and Security, ISDFS 2021, 2021. doi:10.1109/ISDFS52919.2021.9486386.

N. McLaughlin and J. M. Del Rincon, “Data Augmentation for Opcode Sequence Based Malware Detection,” in 2022 Cyber Research Conference - Ireland, Cyber-RCI 2022, 2022. doi: 10.1109/Cyber-RCI55324.2022.10032676.

X. D. Hoang, B. C. Nguyen and T. T. Trang Ninh, "Detecting Malware Based on Statistics and Machine Learning Using Opcode N-Grams," 2023 RIVF International Conference on Computing and Communication Technologies (RIVF), Hanoi, Vietnam, 2023, pp. 118-123, doi: 10.1109/RIVF60135.2023.10471824.

Y. H.-q. SHAO Shu-di, F. Gui-sheng, Detecting malware by combining api and permission features, Computer Science 44 (4) (2017) 135. doi:10.11896/j.issn.1002-137X.2017.04.029.

M. Alazab, M. Alazab, A. Shalaginov, A. Mesleh, and A. Awajan, “Intelligent mobile malware detection using permission requests and API calls,” Future Generation Computer Systems, vol. 107, 2020, doi:10.1016/j.future.2020.02.002.

W. Huang and J. W. Stokes, “MtNet: A multi-task neural network for dynamic malware classification,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016. doi: 10.1007/978-3-319-40667-1_20.

H. Y. Yang and J. Xu, “Android malware detection based on improved random forest,” Tongxin Xuebao/Journal on Communications, vol. 38, no. 4, 2017, doi: 10.11959/j.issn.1000-436x.2017073.

M. A. Khalifa, A. Elsayed, A. Hussien and A. S. Hussainy, "Android Malware Detection and Prevention Based on Deep Learning and Tweets Analysis," 2024 6th International Conference on Computing and Informatics (ICCI), New Cairo - Cairo, Egypt, 2024, pp. 153-157, doi: 10.1109/ICCI61671.2024.10485022.

S. Amenova, C. Turan, and D. Zharkynbek, “Android Malware Classification by CNN-LSTM,” in SIST 2022 - 2022 International Conference on Smart Information Systems and Technologies, Proceedings, 2022. doi: 10.1109/SIST54437.2022.9945816.

B. Purnama, D. Stiawan, D. Hanapi, E. A. Winanto, R. Budiarto, and M. Y. Bin Idris, “N-gram Effect in Malware Detection Using Multilayer Perceptron (MLP),” in International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), 2021. doi: 10.23919/EECSI53397.2021.9624273.

X. Su, W. Shi, X. Qu, Y. Zheng, and X. Liu, “DroidDeep: using Deep Belief Network to characterize and detect android malware,” Soft comput, vol. 24, no. 8, 2020, doi: 10.1007/s00500-019-04589-w.

S. Khalid and F. B. Hussain, “Evaluating Dynamic Analysis Features for Android Malware Categorization,” in 2022 International Wireless Communications and Mobile Computing, IWCMC 2022, 2022. doi:10.1109/IWCMC55113.2022.9824225.

B. Balodi, S. Sharma, A. K. Shukla, and B. Singh, “Automated Static Malware Analysis Using Machine Learning,” in Proceedings of the 10th International Conference on Signal Processing and Integrated Networks, SPIN 2023, 2023. doi:10.1109/SPIN57001.2023.10116580.

Z. Wu, J. Zhang, and L. Kou, “A Model for Malware Detection Method based on API call Sequence Clustering,” in Proceedings - 2022 9th International Conference on Dependable Systems and Their Applications, DSA 2022, 2022. doi: 10.1109/DSA56465.2022.00157.

M. Elalem and T. Jabir, “Malware Analysis in Cyber Security based on Deep Learning; Recognition and Classification,” in Proceeding - 2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering, MI-STA 2023, 2023. doi: 10.1109/MI-STA57575.2023.10169310.

P. Singh, S. K. Borgohain, and J. Kumar, “Performance Enhancement of SVM-based ML Malware Detection Model Using Data Preprocessing,” in 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies, ICEFEET 2022, 2022. doi: 10.1109/ICEFEET51821.2022.9848192.

D. Albashish, R. Al-Sayyed, A. Abdullah, M. H. Ryalat, and N. Ahmad Almansour, “Deep CNN Model based on VGG16 for Breast Cancer Classification,” in 2021 International Conference on Information Technology, ICIT 2021 - Proceedings, 2021. doi:10.1109/ICIT52682.2021.9491631.

M. I. Pavel, S. Y. Tan, and A. Abdullah, “Vision-Based Autonomous Vehicle Systems Based on Deep Learning: A Systematic Literature Review,” Applied Sciences (Switzerland), vol. 12, no. 14. 2022. doi:10.3390/app12146831.

J. GrasleyJ. Grasley and A. D. Alahmar, “Systematic Mapping of Machine Learning-Based Malware Detection Studies,” in International Conference on Electrical, Computer, and Energy Technologies, ICECET 2022, 2022. doi: 10.1109/ICECET55527.2022.9872937.

A. F. Rasheed, M. Zarkoosh, and S. S. Al-Azzawi, “The Impact of Feature Selection on Malware Classification Using Chi-Square and Machine Learning,” in Proceedings of the 9th International Conference on Computer and Communication Engineering, ICCCE 2023, 2023. doi: 10.1109/ICCCE58854.2023.10246084.

R. Kalakoti, S. Nomm, and H. Bahsi, “In-Depth Feature Selection for the Statistical Machine Learning-Based Botnet Detection in IoT Networks,” IEEE Access, vol. 10, 2022, doi:10.1109/access.2022.3204001.

L. M. Kwan, “Markov Image with Transfer Learning for Malware Detection and Classification,” in IEEE Region 10 Annual International Conference, Proceedings/TENCON, 2022. doi:10.1109/tencon55691.2022.9977916.

C. Y. Priyanto, Hendry, and H. D. Purnomo, “Combination of Isolation Forest and LSTM Autoencoder for Anomaly Detection,” in 2021 2nd International Conference on Innovative and Creative Information Technology, ICITech 2021, 2021. doi:10.1109/ICITech50181.2021.9590143.

V. Patil, S. Shetty, A. Tawte, and S. Wathare, “Deep Learning and Binary Representational Image Approach for Malware Detection,” in 2023 International Conference on Power, Instrumentation, Control and Computing, PICC 2023, 2023. doi:10.1109/PICC57976.2023.10142644.

X. Ke and Y. X. Hui, “Android Malware Detection Based on Image Analysis,” in Proceedings of 2021 IEEE 2nd International Conference on Information Technology, Big Data and Artificial Intelligence, ICIBA 2021, 2021. doi: 10.1109/ICIBA52610.2021.9688179.

A. Migdady, L. Smadi, and Q. Yaseen, “A CNN and Image-Based Approach for Malware Analysis,” in 2022 International Conference on Emerging Trends in Computing and Engineering Applications, ETCEA 2022 - Proceedings, 2022. doi:10.1109/ETCEA57049.2022.10009748.

H. Malani, A. Bhat, S. Palriwala, J. Aditya, and A. Chaturvedi, “A Unique Approach to Malware Detection Using Deep Convolutional Neural Networks,” in Proceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE, 2022. doi: 10.1109/ICECIE55199.2022.10000344.

M. Alam, A. Akram, T. Saeed, and S. Arshad, “DeepMalware: A Deep Learning based Malware Images Classification,” in 2021 International Conference on Cyber Warfare and Security, ICCWS 2021 - Proceedings, 2021. doi: 10.1109/ICCWS53234.2021.9703021.

J. Geremias, E. K. Viegas, A. O. Santin, A. Britto, and P. Horchulhack, “Towards a Reliable Hierarchical Android Malware Detection Through Image-based CNN,” in Proceedings - IEEE Consumer Communications and Networking Conference, CCNC, 2023. doi:10.1109/CCNC51644.2023.10060381.

O. E. Kural, D. Ö. Şahin, S. Akleylek, E. Kiliç, and M. Ömüral, “Apk2Img4AndMal: Android Malware Detection Framework Based on Convolutional Neura Network,” in Proceedings - 6th International Conference on Computer Science and Engineering, UBMK 2021, 2021. doi: 10.1109/UBMK52708.2021.9558983.

D. Gibert, C. Mateu, J. Planes, and R. Vicens, “Classification of malware by using structural entropy on convolutional neural networks,” in Proceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018, 2018. doi:10.1609/aaai.v32i1.11409.

J. Guo, Z. Meng, Q. Zhang, Y. Xiong, and W. Huang, “MVVDroid: Android Malware Detection based on Multi-View Visualization,” in Proceedings - 2023 9th International Conference on Big Data Computing and Communications, BigCom 2023, 2023. doi:10.1109/bigcom61073.2023.00021.

R. Rahul and L. Kumble, “Investigation of Malware & Threat Analysis on APKs Using SVM & ANN Algorithm. -A New Approach,” in 2023 International Conference on Recent Advances in Information Technology for Sustainable Development, ICRAIS 2023 - Proceedings, 2023. doi: 10.1109/ICRAIS59684.2023.10367124.

M. Kalash, M. Rochan, N. Mohammed, N. D. B. Bruce, Y. Wang, and F. Iqbal, “Malware Classification with Deep Convolutional Neural Networks,” in 2018 9th IFIP International Conference on New Technologies, Mobility and Security, NTMS 2018 - Proceedings, 2018. doi: 10.1109/NTMS.2018.8328749.

B. Yuan, J. Wang, D. Liu, W. Guo, P. Wu, and X. Bao, “Byte-level malware classification based on markov images and deep learning,” Comput Secur, vol. 92, 2020, doi: 10.1016/j.cose.2020.101740.

M. Krčál, O. Švec, O. Jašek, and M. Bálek, “Deep convolutional malware classifiers can learn from raw executables and labels only,” in 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings, 2018.

K. Thosar, P. Tiwari, R. Jyothula, and D. Ambawade, “Effective Malware Detection using Gradient Boosting and Convolutional Neural Network,” in 2021 IEEE Bombay Section Signature Conference, IBSSC 2021, 2021. doi: 10.1109/IBSSC53889.2021.9673266.

Y. Liu and Y. Wang, “A robust malware detection system using deep learning on API calls,” in Proceedings of 2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference, ITNEC 2019, 2019. doi: 10.1109/ITNEC.2019.8728992.

L. Xiaofeng, J. Fangshuo, Z. Xiao, Y. Shengwei, S. Jing, and P. Lio, “ASSCA: API sequence and statistics features combined architecture for malware detection,” Computer Networks, vol. 157, 2019, doi:10.1016/j.comnet.2019.04.007.

J. Huang, C. Lu, G. Ping, L. Sun, and X. Ye, “TCN-ATT: A Non-recurrent Model for Sequence-Based Malware Detection,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2020. doi:10.1007/978-3-030-47436-2_14.

K. Tsunewaki, T. Kimura, and J. Cheng, “LSTM-Based Ransomware Detection Using API Call Information,” in Proceedings - 2022 IEEE International Conference on Consumer Electronics - Taiwan, ICCE-Taiwan 2022, 2022. doi: 10.1109/ICCE-Taiwan55306.2022.9869122.

S. Abdelmonem, S. Seddik, R. El-Sayed, and A. S. Kaseb, “Enhancing Image-Based Malware Classification Using Semi-Supervised Learning,” in NILES 2021 - 3rd Novel Intelligent and Leading Emerging Sciences Conference, Proceedings, 2021. doi:10.1109/NILES53778.2021.9600511.

H. Gui, F. Liu, C. Zhang, and K. Tang, “A Malware Classification Method based on Attentive Bidirectional Model,” in 2022 7th International Conference on Intelligent Computing and Signal Processing, ICSP 2022, 2022. doi: 10.1109/ICSP54964.2022.9778322.

S. Li, Q. Zhou, R. Zhou, and Q. Lv, “Intelligent malware detection based on graph convolutional network,” Journal of Supercomputing, vol. 78, no. 3, 2022, doi: 10.1007/s11227-021-04020-y.

L. Zhang, J. Yin, J. Ning, Y. Wang, B. Adebisi, and J. Yang, “A Novel Unsupervised Malware Detection Method based on Adversarial Auto-encoder and Deep Clustering,” in Proceedings - 2022 9th International Conference on Dependable Systems and Their Applications, DSA 2022, 2022. doi: 10.1109/DSA56465.2022.00038.

X. Chen, “Power System Malware Detection Based on Deep Belief Network Classifier,” in 2022 6th International Conference on Green Energy and Applications, ICGEA 2022, 2022. doi:10.1109/icgea54406.2022.9792083.

I. S. Srinu and D. Vidyarthi, "Classification of Malware Using Deep Learning: A Study," 2023 IEEE International Carnahan Conference on Security Technology (ICCST), Pune, India, 2023, doi:10.1109/iccst59048.2023.10474230.

A. Abdullah, R. C. Veltkamp, and M. A. Wiering, “Spatial pyramids and two-layer stacking SVM classifiers for image categorization: A comparative study,” in Proceedings of the International Joint Conference on Neural Networks, 2009. doi:10.1109/ijcnn.2009.5178743.

U. Garg, S. S. Rana, D. S. Bisht, R. Rautela, and A. Garg, “A Comparative Analysis of IoT Malware Detection Using CNN and Deep Learning,” in Proceedings - International Conference on Technological Advancements in Computational Sciences, ICTACS 2023, 2023. doi: 10.1109/ICTACS59847.2023.10389976.

L. Alsharafi, M. Asiri, S. Azzony, and A. Alqahtani, “Malware Detection Based on Deep Learning,” in 2023 3rd International Conference on Computing and Information Technology, ICCIT 2023, 2023. doi: 10.1109/ICCIT58132.2023.10273961.

K. L. Lam, A. Abdullah, and D. Albashish, “Ensemble of Fully Convolutional Neural Networks with End-to-End Learning for Small Object Semantic Segmentation,” in Lecture Notes in Networks and Systems, 2023. doi: 10.1007/978-3-031-26889-2_12.

S. Sridhar, R. Seetharaman, and S. Sanagavarapu, “Intelligent Vision-based Malware Classification using Quantised ResNets,” in 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference, IEMCON 2021, 2021. doi:10.1109/iemcon53756.2021.9623219.

D. Vasan, M. Alazab, S. Wassan, B. Safaei, and Q. Zheng, “Image-Based malware classification using ensemble of CNN architectures (IMCEC),” Comput Secur, vol. 92, 2020, doi:10.1016/j.cose.2020.101748.

T. Deng, “A Survey of Convolutional Neural Networks for Image Classification: Models and Datasets,” in Proceedings - 2022 International Conference on Big Data, Information and Computer Network, BDICN 2022, 2022. doi: 10.1109/BDICN55575.2022.00145.

MM. Asaduzzaman and M. M. Rahman, “An Adversarial Approach for Intrusion Detection Using Hybrid Deep Learning Model,” in 2022 International Conference on Information Technology Research and Innovation, ICITRI 2022, 2022. doi:10.1109/icitri56423.2022.9970221.

J. Busch, A. Kocheturov, V. Tresp, and T. Seidl, “NF-GNN: Network Flow Graph Neural Networks for Malware Detection and Classification,” in ACM International Conference Proceeding Series, 2021. doi: 10.1145/3468791.3468814.

P. Feng, J. Ma, T. Li, X. Ma, N. Xi, and D. Lu, “Android Malware Detection via Graph Representation Learning,” Mobile Information Systems, vol. 2021, 2021, doi: 10.1155/2021/5538841.

M. M. Rahman et al., “CNN vs Transformer Variants: Malware Classification Using Binary Malware Images,” in Proceeding - COMNETSAT 2023: IEEE International Conference on Communication, Networks and Satellite, 2023. doi:10.1109/comnetsat59769.2023.10420585.

J. Y. Kim, S. J. Bu, and S. B. Cho, “Malware detection using deep transferred generative adversarial networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017. doi:10.1007/978-3-319-70087-8_58.

E. Venkata Pawan Kalyan, A. Purushottam Adarsh, S. Sai Likith Reddy, and P. Renjith, “Detection of Malware Using CNN,” in 2022 2nd International Conference on Computer Science, Engineering and Applications, ICCSEA 2022, 2022. doi:10.1109/ICCSEA54677.2022.9936225.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).