泡泡一分钟:BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

BLVD: Building A Large-scale 5D Semantics Benchmark for Autonomous Driving

BLVD:构建自主驾驶的大规模5D语义基准

Jianru Xue, Jianwu Fang, Tao Li, Bohua Zhang, Pu Zhang, Zhen Ye and Jian Dou

Abstract—In autonomous driving community, numerous benchmarks have been established to assist the tasks of 3D/2D object detection, stereo vision, semantic/instance segmentation. However, the more meaningful dynamic evolution of the surrounding objects of ego-vehicle is rarely exploited, and lacks a large-scale dataset platform. To address this, we introduce BLVD, a large-scale 5D semantics benchmark which does not concentrate on the static detection or semantic/instance segmentation tasks tackled adequately before. Instead, BLVD aims to provide a platform for the tasks of dynamic 4D (3D+temporal) tracking, 5D (4D+interactive) interactive event recognition and intention prediction.This benchmark will boost the deeper understanding of trafﬁc scenes than ever before. We totally yield 249,129 3D annotations, 4,902 independent individuals for tracking with the length of overall 214,922 points, 6,004 valid fragments for 5D interactive event recognition, and 4,900 individuals for 5D intention prediction. These tasks are contained in four kinds of scenarios depending on the object density (low and high) and light conditions (daytime and nighttime). The benchmark can be downloaded from our project site https://github.com/VCCIV/BLVD/.

在自动驾驶社区中，已经建立了许多基准来辅助3D / 2D物体检测，立体视觉，语义/实例分割的任务。然而，自我车辆周围物体的更有意义的动态演化很少被利用，并且缺乏大规模的数据集平台。为了解决这个问题，我们引入了BLVD，这是一个大规模的5D语义基准测试，它不专注于之前充分处理的静态检测或语义/实例分割任务。相反，BLVD旨在为动态4D（3D +时间）跟踪，5D（4D +交互式）交互式事件识别和意图预测的任务提供平台。该基准将比以往更加深入地了解交通场景。我们完全产生249,129个3D注释，4,902个独立个体用于跟踪，总长度为214,922个点，6,004个有效片段用于5D交互事件识别，4,900个用于5D意图预测。这些任务包含在四种场景中，具体取决于对象密度（低和高）和光照条件（白天和夜晚）。基准测试可以从我们的项目站点https://github.com/VCCIV/BLVD/下载。

在本文中，我们为自动驾驶构建了一个大规模的5D语义基准，该基准在各种有趣的场景下被捕获，并且经过有效和准确的校准，同步和整流。与以前的静态检测/分割任务不同，我们专注于对交通场景的更深入理解。具体而言，4D跟踪，5D交互事件识别和5D意图预测的任务在该基准测试中启动。通过仔细的注释，基准产生了249,129个3D注释，4,902个独立实例用于跟踪，总长度为214,922个点，6,004个用于5D交互式事件识别的3D注释，以及4,900个用于5D意图预测的个体。这些注释是在不同的光照条件下（白天和夜晚），不同密度的参与者（低密度和高密度）和不同的驾驶场景（高速公路和城市）收集的。