Pedestrian detection at daytime and nighttime conditions based on YOLO-v5

Main Article Content

Bryan Montenegro
Marco Flores


This paper presents new algorithm based on deep learning for daytime and nighttime pedestrian detection, named multispectral, focused on vehicular safety applications. The proposal is based on YOLO-v5, and consists of the construction of two subnetworks that focus on working with color (RGB) and thermal (IR) images, respectively. Then the information is merged, through a merging subnetwork that integrates RGB and IR networks to obtain a pedestrian detector. Experiments aimed at verifying the quality of the proposal were conducted using several public pedestrian databases for detecting pedestrians at daytime and nighttime. The main results according to the mAP metric, setting an IoU of 0.5 were: 96.6 \% on the INRIA database, 89.2 % on CVC09, 90.5 % on LSIFIR, 56 % on FLIR-ADAS, 79.8 % on CVC14, 72.3 % on Nightowls and 53.3 % on KAIST.
Abstract 123 | PDF (Español (España)) Downloads 53 PDF Downloads 42


[1] WHO. (2018) Road traffic injuries. World Health Organization. [Online]. Available:
[2] ANT. (2015) Estadísticas de siniestros de tránsito octubre 2015. Agencia Nacional de Tránsito del Ecuador. [Online]. Available:
[3] ——. (2017) Estadísticas de siniestros de tránsito agosto 2017. Agencia Nacional de Tránsito del Ecuador. [Online]. Available:
[4] J. Liu, S. Zhang, S. Wang, and D. N. Metaxas, “Multispectral deep neural networks for pedestrian detection,” 2016. [Online]. Available:
[5] D. König, M. Adam, C. Jarvers, G. Layher, H. Neumann, and M. Teutsch, “Fully convolutional region proposal networks for multispectral person detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 243–250. [Online]. Available:
[6] D. Guan, Y. Cao, J. Yang, Y. Cao, and M. Y. Yang, “Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection,” Information Fusion, vol. 50, pp. 148–157, 2019. [Online]. Available:
[7] J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, “Scale-aware fast R-CNN for pedestrian detection,” IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 985–996, 2018. [Online]. Available:
[8] J. Cao, C. Song, S. Peng, S. Song, X. Zhang, Y. Shao, and F. Xiao, “Pedestrian detection algorithm for intelligent vehicles in complex scenarios,” Sensors, vol. 20, no. 13, p. 3646, 2020. [Online]. Available:
[9] Caltech. (2016) Caltech pedestrian detection benchmark. [Online]. Available:
[10] Pascal. (2016) Inria person dataset. [Online]. Available:
[11] X. Song, S. Gao, and C. Chen, “A multispectral feature fusion network for robust pedestrian detection,” Alexandria Engineering Journal, vol. 60, no. 1, pp. 73–85, 2021. [Online]. Available:
[12] A. Wolpert, M. Teutsch, M. S. Sarfraz, and R. Stiefelhagen, “Anchor-free small-scale multispectral pedestrian detection,” 2020. [Online]. Available:
[13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” 2016. [Online]. Available:
[14] C. Ertler, H. Possegger, M. Opitz, and H. Bischof, “Pedestrian detection in RGB-D images from an elevated viewpoint,” in Proceedings of the 22nd Computer Vision Winter Workshop, W. Kropatsch, I. Janusch, and N. Artner, Eds. Austria: TU Wien, Pattern Recongition and Image Processing Group, 2017. [Online]. Available:
[15] X. Zhang, G. Chen, K. Saruta, and Y. Terata, “Deep convolutional neural networks for all-day pedestrian detection,” in Information Science and Applications 2017, K. Kim and N. Joukov, Eds. Singapore: Springer Singapore, 2017, pp. 171–178. [Online]. Available:
[16] L. Zhang, L. Lin, X. Liang, and K. He, “Is faster r-cnn doing well for pedestrian detection?” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 443–457. [Online]. Available:
[17] J. H. Kim, H. G. Hong, and K. R. Park, “Convolutional neural network-based human detection in nighttime images using visible light camera sensors,” Sensors, vol. 17, no. 5, 2017. [Online]. Available:
[18] L. Ding, Y. Wang, R. Laganiere, D. Huang, and S. Fu, “Convolutional neural networks for multispectral pedestrian detection,” Signal Processing: Image Communication, vol. 82, p. 115764, 2020. [Online]. Available:
[19] S. Hwang, J. Park, N. Kim, Y. Choi, and I. S. Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1037–1045. [Online]. Available:
[20] Caltech. (2012) Caltech pedestrian detection benchmark. [Online]. Available: https: //
[21] Pascal. (2012) INRIA person dataset. [Online]. Available:
[22] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [Online]. Available:
[23] X. Yu, Y. Si, and L. Li, “Pedestrian detection based on improved faster rcnn algorithm,” in 2019 IEEE/CIC International
Conference on Communications in China (ICCC), 2019, pp. 346–351. [Online]. Available:
[24] Y. He, C. Zhu, and X.-C. Yin, “Mutualsupervised feature modulation network for occluded pedestrian detection,” 2020. [Online]. Available:
[25] F. B. Tesema, H. Wu, M. Chen, J. Lin, W. Zhu, and K. Huang, “Hybrid channel based pedestrian detection,” Neurocomputing, vol. 389, pp. 1–8, 2020. [Online]. Available:
[26] C. Kyrkou, “Yolopeds: efficient real time single shot pedestrian detection for smart camera applications,” IET Computer Vision, vol. 14, no. 7, pp. 417–425, Oct 2020. [Online]. Available:
[27] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017. [Online]. Available:
[28] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: single shot multibox detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 21–37. [Online]. Available:
[29] F. Chabot, Q.-C. Pham, and M. Chaouch, “Lapnet : Automatic balanced loss and optimal assignment for real-time dense object detection,” 2020. [Online]. Available:
[30] K. Zhou, L. Chen, and X. Cao, “Improving multispectral pedestrian detection by addressing modality imbalance problems,” 2020. [Online]. Available:
[31] W. Wang, “Detection of panoramic vision pedestrian based on deep learning,” Image and Vision Computing, vol. 103, p. 103986, 2020. [Online]. Available:
[32] I. Shopovska, L. Jovanov, and W. Philips, “Deep visible and thermal image fusion for enhanced pedestrian visibility,” Sensors, vol. 19, no. 17, 2019. [Online]. Available:
[33] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” 2016. [Online]. Available:
[34] D. Heo, E. Lee, and B. Chul Ko, “Pedestrian detection at night using deep neural networks and saliency maps,” Journal of Imaging Science and Technology, vol. 61, no. 6, pp. 604 031–604 039, 2017. [Online]. Available:
[35] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” 2018. [Online]. Available:
[36] G. Jocher, A. Stoken, J. Borovec, NanoCode012, A. Chaurasia, TaoXie, L. Changyu, V. Abhiram, Laughing, tkianai, yxNONG, A. Hogan, lorenzomammana, AlexWang1900, J. Hajek, L. Diaconu, Marc, Y. Kwon, oleg, wanghaoyang0106, Y. Defretin, A. Lohia, ml5ah, B. Milanko, B. Fineran, D. Khromov, D. Yiwei, Doug, Durgesh, and F. Ingham, “ultralytics/yolov5: v5.0 - YOLOv5- P6 1280 models, AWS, and YouTube integrations,” Apr. 2021. [Online]. Available:
[37] D. Olmeda, C. Premebida, U. Nunes, J. M. Armingol, and A. de la Escalera, “Pedestrian detection in far infrared images,” Integrated Computer-Aided Engineering, vol. 20, no. 4, pp. 347–360, 2013. [Online]. Available:
[38] Teledyne Flir. (2021) Free flir thermal dataset for algorithm training. Teledyne FLIR LLC All rights reserved. [Online]. Available:
[39] NightOwls. (2021) About nightowls. NightOwls Datasets. [Online]. Available:
[40] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes (VOC) Challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010. [Online]. Available: