Deep Vision of Mobility for Urban Detection from Street-View Projections

Akgül B. A., PARLAK İ. B., Adel M.

14th International Conference on Image Processing, Theory, Tools and Applications, IPTA 2025, İstanbul, Türkiye, 13 - 16 Ekim 2025, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/ipta66025.2025.11222018
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: computer vision, image processing, object detection, visual perception, YOLO
Galatasaray Üniversitesi Adresli: Evet

Özet

Urban mobility demands intelligent systems that can perceive and interpret complex street environments. This study presents a deep learning-based, vision-only framework for traffic scene understanding, relying on 2D projections from 360-degree panoramic imagery. Our model combines YOLOv8 and YOLOv10 to detect roads, vehicles, and pedestrians in diverse urban settings. A custom dataset of 3,052 manually labeled images from two cities - Paris and Marseille - was collected from Google Street View. Results confirm that vision-only models can accurately recognize urban elements with Precision, Recall and mAP scores competitive with sensor-rich methods. While some limitations remain - such as detecting small or occluded pedestrians - the findings point to a future where scalable, lowcost vision systems power autonomous navigation. This work offers both a novel dataset and a benchmark toward that contribution of 3D panoramic urban city analysis.