Robust Pose Estimation of Pedestrians with a Deep Neural Networks

Chuho Yi; Jungwon Cho

doi:10.18517/ijaseit.13.4.19022

DOI : https://doi.org/10.18517/ijaseit.13.4.19022

Robust Pose Estimation of Pedestrians with a Deep Neural Networks

Chuho Yi ⁽¹⁾, Jungwon Cho ⁽²⁾

(1) Department of AI Convergence, Hanyang Women's University, Seoul, Republic of Korea

(2) Department of Computer Education, Jeju National University, Jeju, Republic of Korea

Fulltext View | Download

How to cite (IJASEIT) :

Yi, Chuho, and Jungwon Cho. “Robust Pose Estimation of Pedestrians With a Deep Neural Networks”. International Journal on Advanced Science, Engineering and Information Technology, vol. 13, no. 4, Aug. 2023, pp. 1561-5, doi:10.18517/ijaseit.13.4.19022.

Citation Format :

In this paper, we provide a method for robust estimation of pedestrian pose that is especially useful for autonomous vehicles traveling toward pedestrians far away. Pedestrians in the far distance appear relatively small when seen by a camera, making it difficult to estimate the pedestrian's pose. We use fused deep neural networks (DNNs) to resolve the problems presented by pedestrians in the far distance. First, DNNs are used to detect pedestrians and enlarge the observed image. Next, the DNN method of pose estimation is applied. The proposed method uses a single camera to estimate the posture of a pedestrian in the far distance. Far-off pedestrians observed by cameras in moving cars appear as low-resolution images of non-rigid bodies. Detection and orientation estimation are difficult with conventional image processing methods. We used a series of DNNs to detect pedestrians, improve data availability, and estimate challenging postures to address these limitations. In this paper, we propose a method based on the multi-stage fusion of DNNs to solve a difficult problem for a single DNN. The experimental results established the superiority of the proposed method when applied to data challenging for conventional pose estimation methods. Applications of the proposed method include observing small objects and objects in the far distance. The method may be especially useful in surveillance systems, sports broadcasting, and other applications requiring human posture estimation.

G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903-4911, 2017.

H. Wang, R. A. Gí¼ler, I. Kokkinos, G. Papandreou, and S. Zafeiriou, “BLSM: A bone-level skinned model of the human mesh,” In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V 16, pp. 1-17, 2020.

Z. Cao, T. Simon, S. E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291-7299, 2017.

Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields,” IEEE transactions on pattern analysis and machine intelligence, 43(1), pp. 172-186, 2021.

Z. Cao, H. Gao, K. Mangalam, Q. Z. Cai, M. Vo, and J. Malik, “Long-term human motion prediction with scene context,” In Computer Vision-ECCV 2020: 16th European Conference, pp. 387-404, 2020.

Z. Cao, G. Hidalgo, T. Simon, S. E. Wei, and Y. Sheikh, “OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields,” IEEE transactions on pattern analysis and machine intelligence, 43(1), pp. 172-186, 2021.

H. S. Fang, S. Xie, Y. W. Tai, and C. Lu, “Rmpe: Regional multi-person pose estimation,” In Proceedings of the IEEE International Conference on Computer Vision, pp. 2334-2343, 2017.

J. Li, C. Wang, H. Zhu, Y. Mao, H. S. Fang, and C. Lu, “Crowdpose: Efficient crowded scenes pose estimation and a new benchmark,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10863-10872, 2019.

H. S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, and C. Lu, “Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

J. Sun, Y. Li, L. Chai, H. S. Fang, Y. L. Li, and C. Lu, “Human trajectory prediction with momentary observation,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6467-6476, 2022.

H. S. Fang, Y. Xie, D. Shao, Y. L. Li, and C. Lu, “DecAug: augmenting HOI detection via decomposition,” In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 2, pp. 1300-1308, 2021.

H. S. Fang, Y. Xie, D. Shao, and C. Lu, “Dirv: Dense interaction region voting for end-to-end human-object interaction detection,” In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, No. 2, pp. 1291-1299, 2021.

C. Ledig, L. Theis, F. Huszí¡r, J. Caballero, A. Cunningham, A. Acosta, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4681-4690, 2017.

J. M. Wolterink, K. Kamnitsas, C. Ledig, and I. IÅ¡gum, “Deep learning: Generative adversarial networks and adversarial methods,” In Handbook of Medical Image Computing and Computer Assisted Intervention, pp. 547-574, 2020.

C. Rockwell, D. F. Fouhey, and J. Johnson, “Pixelsynth: Generating a 3d-consistent experience from a single image,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14104-14113, 2021.

S. Kreiss, L. Bertoni, and A. Alahi, “Pifpaf: Composite fields for human pose estimation,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11977-11986, 2019.

W. Shi, J. Caballero, F. Huszí¡r, J. Totz, A. P. Aitken, R. Bishop, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874-1883, 2016.

C. Ouyang, J. Schlemper, C. Biffi, G. Seegoolam, J. Caballero, A. N. Price, and D. Rueckert, “Generalising deep learning MRI reconstruction across different domains,” arXiv preprint arXiv:1902.10815, 2019.

S. Park, J. Yoo, D. Cho, J. Kim, and T. H. Kim, “Fast adaptation to super-resolution networks via meta-learning,” In Computer Vision-ECCV 2020: 16th European Conference, Proceedings, Part XXVII 16, pp. 754-769, 2020.

S. Lee, D. Cho, J. Kim, and T. H. Kim, “Restore from restored: Video restoration with pseudo clean video,” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3537-3546, 2021.

S. Lee, J. Kim, J. S. Yoon, S. Shin, O. Bailo, N. Kim, and I. S. Kweon, “Vpgnet: Vanishing point guided network for lane and road marking detection and recognition,” In Proceedings of the IEEE international conference on computer vision, pp. 1947-1955, 2017.

G. Chen, K. Chen, L. Zhang, L. Zhang, and A. Knoll, “VCANet: Vanishing-point-guided context-aware network for small road object detection,” Automotive Innovation, 4, pp. 400-412, 2021.

X. Li, L. Zhu, Z. Yu, B. Guo, and Y. Wan, “Vanishing point detection and rail segmentation based on deep multi-task learning,” IEEE Access, 8, pp. 163015-163025, 2020.

W. Wang, P. Lu, X. Peng, W. Yin, and Z. Zhao, “RLSCNet: A Residual Line-Shaped Convolutional Network for Vanishing Point Detection,” In MultiMedia Modeling: 29th International Conference, MMM 2023, pp. 103-114, 2023.

G. Welch and G. Bishop, An introduction to the Kalman filter, 1995.

M. Khodarahmi and V. Maihami, “A review on Kalman filter models,” Archives of Computational Methods in Engineering, 30(1), pp. 727-747, 2023.

A. Rasouli, I. Kotseruba, and J. K. Tsotsos, “Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior,” In Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 206-213, 2017.

I. Kotseruba, A. Rasouli, and J. K. Tsotsos, “Benchmark for evaluating pedestrian action prediction,” In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1258-1268, 2021.

B. Liu, E. Adeli, Z. Cao, K. H. Lee, A. Shenoi, A. Gaidon, and J. C. Niebles, “Spatiotemporal relationship reasoning for pedestrian intent prediction,” IEEE Robotics and Automation Letters, 5(2), pp. 3485-3492, 2020.

B. Yang, W. Zhan, P. Wang, C. Chan, Y. Cai, and N. Wang, “Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment,” IEEE Transactions on Intelligent Transportation Systems, 23(6), pp. 5338-5349, 2021.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution LicenseÂ that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (SeeÂ The Effect of Open Access).