Robust Estimation of Crowd Density Using Vision Transformers
How to cite (IJASEIT) :
B. Li, H. Huang, A. Zhang, P. Liu, and C. Liu, “Approaches on crowd counting and density estimation: a review,” Pattern Analysis and Applications, vol. 24, no. 3, pp. 853–874, Feb. 2021, doi:10.1007/s10044-021-00959-z.
M. A. Khan, H. Menouar, and R. Hamila, “Revisiting crowd counting: State-of-the-art, trends, and future perspectives,” Image and Vision Computing, vol. 129, p. 104597, Jan. 2023, doi:10.1016/j.imavis.2022.104597.
D. Morgan, “Where are we?: camera movements and the problem of point of view,” New Review of Film and Television Studies, vol. 14, no. 2, pp. 222–248, Feb. 2016, doi: 10.1080/17400309.2015.1125702.
H. Rahmalan, M. S. Nixon, and J. N. Carter, “On crowd density estimation for surveillance,” IET Conference on Crime and Security, vol. 2006, pp. 540–545, 2006, doi: 10.1049/ic:20060360.
H. Rahmalan, N. Suryana, & N. A. Abu, “A general approach for measuring crowd movement,” Malaysian Technical Universities Conference and Exhibition on Engineering and Technology, Jan. 2009.
S. A. M. Saleh, S. A. Suandi, and H. Ibrahim, “Recent survey on crowd density estimation and counting for visual surveillance,” Engineering Applications of Artificial Intelligence, vol. 41, pp. 103–114, May 2015, doi: 10.1016/j.engappai.2015.01.007.
Z. Fan, H. Zhang, Z. Zhang, G. Lu, Y. Zhang, and Y. Wang, “A survey of crowd counting and density estimation based on convolutional neural network,” Neurocomputing, vol. 472, pp. 224–251, Feb. 2022, doi: 10.1016/j.neucom.2021.02.103.
M. Elgendy, Deep learning for vision systems, Simon and Schuster, 2020.
N. Sharma, R. Sharma, & N. Jindal, “Machine learning and deep learning applications-a vision,” Global Transitions Proceedings, 2(1), pp. 24-28, 2021.
N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G. V. Hernandez, L. Krpalkova, & J. Walsh, “Deep Learning vs. Traditional Computer Vision,” Advances in Computer Vision, pp. 128–144, Apr. 2019, doi: 10.1007/978-3-030-17795-9_10.
S. Islam et al., “A comprehensive survey on applications of transformers for deep learning tasks,” Expert Systems with Applications, vol. 241, p. 122666, May 2024, doi:10.1016/j.eswa.2023.122666.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, & N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
M. Raghu, T. Unterthiner, S. Kornblith, C. Zhang, & A. Dosovitskiy, “Do vision transformers see like convolutional neural networks?,” Advances in Neural Information Processing Systems, 34, pp. 12116–12128, 2021.
Y. Chen, J. Yang, B. Chen, and S. Du, “Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 1055–1068, Mar. 2023, doi:10.1109/tcsvt.2022.3208714.
Y. Xiao et al., “A review of object detection based on deep learning,” Multimedia Tools and Applications, vol. 79, no. 33–34, pp. 23729–23791, Jun. 2020, doi: 10.1007/s11042-020-08976-6.
H. Lin, Z. Ma, R. Ji, Y. Wang, and X. Hong, “Boosting Crowd Counting via Multifaceted Attention,” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, doi:10.1109/cvpr52688.2022.01901.
H. Lin, Z. Ma, X. Hong, Q. Shangguan, and D. Meng, “Gramformer: Learning Crowd Counting via Graph-Modulated Transformer,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 4, pp. 3395–3403, Mar. 2024, doi: 10.1609/aaai.v38i4.28126.
M. Zand, H. Damirchi, A. Farley, M. Molahasani, M. Greenspan, and A. Etemad, “Multiscale Crowd Counting and Localization By Multitask Point Supervision,” ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2022, doi: 10.1109/icassp43922.2022.9747776.
C. Xu, K. Qiu, J. Fu, S. Bai, Y. Xu, and X. Bai, “Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting,” 2019 IEEE/CVF International Conference on Computer Vision (ICCV), vol. 521, pp. 8381–8389, Oct. 2019, doi:10.1109/iccv.2019.00847.
C. Liu, H. Lu, Z. Cao, and T. Liu, “Point-Query Quadtree for Crowd Counting, Localization, and More,” 2023 IEEE/CVF International Conference on Computer Vision (ICCV), vol. 2105, pp. 1676–1685, Oct. 2023, doi: 10.1109/iccv51070.2023.00161.
V. Sindagi, R. Yasarla, and V. M. M. Patel, “JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2020, doi: 10.1109/tpami.2020.3035969.
Q. Wang, J. Gao, W. Lin, and X. Li, “NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 6, pp. 2141–2149, Jun. 2021, doi: 10.1109/tpami.2020.3013269.
T. E. Oliphant, Guide to NumPy (Vol. 1, p. 85), USA: Trelgol Publishing, 2006.
S. van der Walt, S. C. Colbert, and G. Varoquaux, “The NumPy Array: A Structure for Efficient Numerical Computation,” Computing in Science & Engineering, vol. 13, no. 2, pp. 22–30, Mar. 2011, doi:10.1109/mcse.2011.37.
R. Shen, S. Bubeck, & S. Gunasekar, “Data augmentation as feature manipulation,” In International Conference on Machine Learning (PMLR), pp. 19773-19808, June, 2022.
S.-H. Choi, “A study on Object Detection Method using Raspberry Pi,” Intelligent Information Convergence and Future Education, pp.1-6, 2022.
S.-H. Go, S.-M. Yang, H.-Y. Kim, and S.-B. Gwak, “Multi-Spectrum CNN-Based High-Resolution Color Image Interpolation Technique,” The Korean Association of Computer Education, 27(3), pp. 145-153, 2024.
D. Kim, J. Jeon, S. Lim, and H. Lee, “An Object Pseudo-Label Generation Technique based on Self-Supervised Vision Transformer for Improving Dataset Quality,” Journal of KIISE, vol. 51, no. 1, pp. 49–58, Jan. 2024, doi: 10.5626/jok.2024.51.1.49.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G., Chanan, & S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” Advances in Neural Information Processing Systems, 32, 2019.
S. Imambi, K. B. Prakash, & G. R. Kanagachidambaresan, PyTorch. Programming with TensorFlow: solution for edge computing applications, pp. 87-104, 2021.
Z. Ma, X. Hong, X. Wei, Y. Qiu, and Y. Gong, “Towards A Universal Model for Cross-Dataset Crowd Counting,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, doi:10.1109/iccv48922.2021.00319.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).