Neural Computing and Applications, cilt.36, sa.27, ss.17125-17143, 2024 (SCI-Expanded)
Sensors have different perceptive abilities against environment. Sensor fusion plays a crucial role at achieving better perception by accumulating the information acquired at different times. But, the decision of the observation may conflict with each other due to usage of different algorithms, thresholds on processing algorithms, and different perceptive character of sensors. This study presents the late fusion method applied to outputs provided by deep learning models fed by camera and lidar sensor’s measurement data. For camera sensor, a deep learning model as a multi-task network is proposed to multi-classify cars, motorcycles, bicycles, buses, trucks, and pedestrians under the category of dynamic traffic objects. In addition, color classified traffic lights and traffic signs with a capability of segmenting drivable area and detecting lane lines are classified under the category of static traffic objects. The proposed multi-network is trained and tested with the BDD100K dataset and benchmarked with publicly available multi-networks. The presented method is the second fastest multi-network reaching at 52 FPS runtime, ranked second based on the drivable area segmentation and lane line detection performance. For segmentation of dynamic objects, the network performance is increased by 22.45%, and considering mIoU overall performance increase is 3.96%. For a lidar sensor, a different modality is presented to detect objects. Two sensors’ data are fused by proposed fusion algorithm, and results are tested and evaluated with the KITTI dataset. The proposed fusion methodology outperforms the stand-alone lidar methods about 3.58% and 3.63% on BEV and 3D detection MAP, respectively. Overall, benchmarking with two distinct fusion approaches illustrates the effectiveness of the proposed method.