Comparative Analysis and Exploration of 3DResNet-18 and InceptionV3-GRU with Temporal Segment Network (TSN) Framework for Laparoscopic Surgical Video Expertise Classification

Liem Roy Marcelino (1), Samuel Batara Kelengate Munthe (2), Ronny Dominikus Munthe (3), Eko Adi Sarwoko (4), Aris Sugiharto (5), Helmie Arif Wibawa (6), Aris Puji Widodo (7), Fajar Agung Nugroho (8), Anis Farihan Binti Mat Raffei (9), Adi Wibowo (10)
(1) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(2) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(3) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(4) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(5) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(6) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(7) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(8) Department of Informatics, Universitas Diponegoro, Semarang, Indonesia
(9) Faculty of Computing, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Pahang, Malaysia
(10) a:1:{s:5:"en_US";s:22:"Universitas Diponegoro";}
Fulltext View | Download
How to cite (IJASEIT) :
[1]
L. R. Marcelino, “Comparative Analysis and Exploration of 3DResNet-18 and InceptionV3-GRU with Temporal Segment Network (TSN) Framework for Laparoscopic Surgical Video Expertise Classification”, Int. J. Adv. Sci. Eng. Inf. Technol., vol. 15, no. 2, pp. 353–362, Apr. 2025.
Laparoscopic surgery is a widely adopted minimally invasive procedure that requires surgeons to master complex skills such as suturing, knot-tying, and needle-passing. Traditional assessment of these skills is often subjective and prone to bias, relying heavily on manual evaluation by expert surgeons, which can vary between evaluators. We applied deep learning models to automate surgical skill evaluation to address this issue and move towards a more objective and standardized assessment method. In this study, we utilized two advanced architectures—3D ResNet-18 and InceptionV3-GRU—within a Temporal Segment Network (TSN) framework to classify skill levels using the publicly available JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) dataset. We focused on optimizing temporal sampling by adjusting the number of frames and frame intervals in the video data. Our findings show that capturing longer sequences of actions improved accuracy for suturing and needle-passing tasks while capturing more detailed motions enhanced performance for knot-tying. Our findings suggest that capturing longer sequences of actions improved accuracy for suturing and needle-passing tasks while capturing more detailed motions enhanced performance for knot-tying. The 3D ResNet-18 model achieved 100% accuracy across all tasks, significantly outperforming the InceptionV3-GRU model, which achieved 85.71% for suturing, 77.42% for knot-tying, and 100% for needle-passing. These results demonstrate the superior capability of the 3D ResNet-18 model in surgical skill classification and highlight the critical role of temporal optimization in improving performance across different surgical tasks.

Y. Gao et al., “JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS): A Surgical Activity Dataset for Human Motion Modeling,” Model. Monit. Comput. Assist. Interv. – MICCAI Work., pp. 1–10, 2014.

L. I. Basunbul, L. S. S. Alhazmi, S. A. Almughamisi, N. M. Aljuaid, H. Rizk, and R. Moshref, “Recent Technical Developments in the Field of Laparoscopic Surgery: A Literature Review,” Cureus, vol. 14, no. 2, 2022, doi: 10.7759/cureus.22246.

M. Ali, R. M. G. Pena, G. O. Ruiz, and S. Ali, “A comprehensive survey on recent deep learning-based methods applied to surgical data,” 2022, doi: 10.48550/arXiv.2209.01435.

F. von Bechtolsheim et al., “Does practice make perfect? Laparoscopic training mainly improves motion efficiency: a prospective trial.,” Updates Surg., vol. 75, no. 5, pp. 1103–1115, Aug. 2023, doi: 10.1007/s13304-023-01511-w.

C. W. Reynolds et al., “Evidence supporting performance measures of laparoscopic appendectomy through a novel surgical proficiency assessment tool and low-cost laparoscopic training system,” Surg. Endosc., vol. 37, no. 9, pp. 7170–7177, 2023, doi: 10.1007/s00464-023-10182-y.

B. Chen et al., “Trends and hotspots in research on medical images with deep learning: a bibliometric analysis from 2013 to 2023,” Front. Artif. Intell., vol. 6, no. November, pp. 1–14, 2023, doi: 10.3389/frai.2023.1289669.

I. Galić, M. Habijan, H. Leventić, and K. Romić, “Machine Learning Empowering Personalized Medicine: A Comprehensive Review of Medical Image Analysis Methods,” Electron., vol. 12, no. 21, 2023, doi: 10.3390/electronics12214411.

K. Guo et al., “Current applications of artificial intelligence-based computer vision in laparoscopic surgery,” Laparosc. Endosc. Robot. Surg., vol. 6, no. 3, pp. 91–96, 2023, doi: 10.1016/j.lers.2023.07.001.

L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1. Springer International Publishing, 2021. doi: 10.1186/s40537-021-00444-8.

F. M. Shiri, T. Perumal, N. Mustapha, and R. Mohamed, “A Comprehensive Overview and Comparative Analysis on Deep Learning Models: CNN, RNN, LSTM, GRU,” 2023, doi: 10.48550/arXiv.2305.17473.

S. Tiwari, G. Jain, D. K. Shetty, M. Sudhi, J. M. Balakrishnan, and S. R. Bhatta, “A Comprehensive Review on the Application of 3D Convolutional Neural Networks in Medical Imaging,” Eng. Proc., vol. 59, no. 1, pp. 1–9, 2023, doi: 10.3390/engproc2023059003.

J. Liu, T. Wang, A. Skidmore, Y. Sun, P. Jia, and K. Zhang, “Integrated 1D, 2D, and 3D CNNs Enable Robust and Efficient Land Cover Classification from Hyperspectral Imagery,” Remote Sens., vol. 15, no. 19, 2023, doi: 10.3390/rs15194797.

I. Funke, S. T. Mees, J. Weitz, and S. Speidel, “Video-based surgical skill assessment using 3D convolutional neural networks.,” Int. J. Comput. Assist. Radiol. Surg., vol. 14, no. 7, pp. 1217–1225, Jul. 2019, doi: 10.1007/s11548-019-01995-1.

Z. Jian, W. Yue, Q. Wu, W. Li, Z. Wang, and V. Lam, “Multitask Learning for Video-based Surgical Skill Assessment,” in 2020 Digital Image Computing: Techniques and Applications (DICTA), 2020, pp. 1–8. doi: 10.1109/DICTA51227.2020.9363408.

H. E. Kim, A. C. Linan, N. Santhanam, M. Jannesari, M. E. Maros, and T. Ganslandt, “Transfer learning for medical image classification : a literature review,” BMC Med. Imaging, pp. 1–13, 2022, doi: 10.1186/s12880-022-00793-7.

D. Tran, H. Wang, L. Torresani, J. Ray, Y. Lecun, and M. Paluri, “A Closer Look at Spatiotemporal Convolutions for Action Recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 6450–6459, 2018, doi: 10.1109/CVPR.2018.00675.

R. Achmad, Y. Tokoro, J. Haurissa, and A. Wijanarko, “Recurrent Neural Network-Gated Recurrent Unit for Indonesia-Sentani Papua Machine Translation,” J. Inf. Syst. Informatics, vol. 5, no. 4, pp. 1449–1460, 2023, doi: 10.51519/journalisi.v5i4.597.

X. Li, “CNN-GRU model based on attention mechanism for large-scale energy storage optimization in smart grid,” Front. Energy Res., vol. 11, no. July, pp. 1–16, 2023, doi: 10.3389/fenrg.2023.1228256.

L. Lu, C. Zhang, K. Cao, T. Deng, and Q. Yang, “A Multichannel CNN-GRU Model for Human Activity Recognition,” IEEE Access, vol. 10, pp. 66797–66810, 2022, doi: 10.1109/ACCESS.2022.3185112.

H. Ullah and A. Munir, “Human Activity Recognition Using Cascaded Dual Attention CNN and Bi-Directional GRU Framework,” J. Imaging, vol. 9, no. 7, 2023, doi: 10.3390/jimaging9070130.

M. Shafiq and Z. Gu, “Deep Residual Learning for Image Recognition: A Survey,” Appl. Sci., vol. 12, no. 18, 2022, doi: 10.3390/app12188972.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.

J. Cao, M. Yan, Y. Jia, X. Tian, and Z. Zhang, “Application of a modified Inception-v3 model in the dynasty-based classification of ancient murals,” EURASIP J. Adv. Signal Process., vol. 2021, no. 1, 2021, doi: 10.1186/s13634-021-00740-8.

K. Kasa, D. Burns, M. G. Goldenberg, O. Selim, C. Whyne, and M. Hardisty, “Multi-Modal Deep Learning for Assessing Surgeon Technical Skill,” Sensors, vol. 22, no. 19, 2022, doi: 10.3390/s22197328.

Q.-Q. Hong, L. Yang, and B. Zeng, “RANET: A Grasp Generative Residual Attention Network for Robotic Grasping Detection,” Int. J. Control. Autom. Syst., vol. 20, no. 12, pp. 3996–4004, 2022, doi: 10.1007/s12555-021-0929-8.

W. Zhou, J. Lu, Z. Xiong, and W. Wang, “Leveraging TCN and Transformer for effective visual-audio fusion in continuous emotion recognition,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2023, pp. 5756–5763. doi: 10.1109/cvprw59228.2023.00610.

X. A. Nguyen, D. Ljuhar, M. Pacilli, R. M. Nataraja, and S. Chauhan, “Surgical skill levels: Classification and analysis using deep neural network model and motion signals.,” Comput. Methods Programs Biomed., vol. 177, pp. 1–8, Aug. 2019, doi: 10.1016/j.cmpb.2019.05.008.

H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P. A. Muller, “Accurate and interpretable evaluation of surgical skills from kinematic data using fully convolutional neural networks,” Int. J. Comput. Assist. Radiol. Surg., vol. 14, no. 9, pp. 1611–1617, 2019, doi: 10.1007/s11548-019-02039-4.

R. Pedrett, P. Mascagni, G. Beldi, N. Padoy, and J. L. Lavanchy, “Technical skill assessment in minimally invasive surgery using artificial intelligence: a systematic review,” Surg. Endosc., vol. 37, no. 10, pp. 7412–7424, 2023, doi: 10.1007/s00464-023-10335-z.

E. Yanik et al., “Deep neural networks for the assessment of surgical skills: A systematic review,” J. Def. Model. Simul., vol. 19, no. 2, pp. 159–171, 2022, doi: 10.1177/15485129211034586.

G. Karimian, E. Petelos, and S. M. A. A. Evers, “The ethical issues of the application of artificial intelligence in healthcare: a systematic scoping review,” AI Ethics, vol. 2, no. 4, pp. 539–551, 2022, doi: 10.1007/s43681-021-00131-7.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following terms:

    1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
    2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
    3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).