SEED4D

1Continental AG, 2University of Freiburg, 3Australian National University,
4University of Oxford, 5University of Tübingen,

Abstract

SEED4D is a large-scale synthetic multi-view dynamic 4D driving dataset, data generator and benchmarks. Models for egocentric 3D and 4D reconstruction, including few-shot interpolation and extrapolation settings, can benefit from having images from exocentric viewpoints as supervision signals. No existing dataset provides the necessary mixture of complex, dynamic, and multi-view data. To facilitate the development of 3D and 4D reconstruction methods in the autonomous driving context, we propose a Synthetic Ego--Exo Dynamic 4D (SEED4D) dataset. SEED4D encompasses two large-scale multi-view synthetic urban scene datasets. Our static (3D) dataset encompasses 212K inward- and outward-facing vehicle images from 2K scenes, while our dynamic (4D) dataset contains 16.8M images from 10K trajectories, each sampled at 100 points in time with egocentric images, exocentric images, and LiDAR data. We additionally present a customizable, easy-to-use data generator for spatio-temporal multi-view data creation. Our open-source data generator allows the creation of synthetic data for camera setups commonly used in the NuScenes, KITTI360, and Waymo datasets.

Image description

Datasets

Static Ego-Exo Dataset. We introduce a novel dataset for few-view image reconstruction tasks in an autonomous driving setting. Our dataset contains 2002 single-timestep complex outdoor driving scenes, each offering six plus one outward-facing vehicle images and 100 images from exocentric viewpoints on a bounding sphere for supervision. Only a single vehicle in the scene is equipped with this setup. We define ego views to be 900 x 1600 to match the NuScenes dataset and the surround vehicle exo views to be 600 x 800.

Dynamic Ego-Exo Dataset. Our temporal dataset consists of 10.5K driving trajectories well-suited for 4D forecasting, 4D reconstruction, or video prediction tasks. Each trajectory is 100 steps long, corresponding to a driving length of 10 seconds. The 10.5k trajectories come from a total of 498 scenes across all towns. In each scene, the number of vehicles is set to $21$, all equipped with six plus one outward-facing vehicle camera and ten inward-facing surround vehicle exocentric images. The ego views have size 128 x 256 and the exo views are set to 98 x 128.

The static and the dynamic ego--exo view datasets are visualized in the Figure below. They differ mainly in image resolution and trajectory length and have complementary strengths. The static ego--exo dataset contains 12k egocentric views and 200k exocentric views. The dynamic ego--exo dataset contains 6.3M egocentric views and 10.5M exocentric views.

Image description

Benchmark Results

Multi-view Novel View Synthesis. We evaluate how well existing methods can reconstruct the scene given many spherical views. We divide the 100 spherical views into training and test data using an 80/20 split.

Monocular Metric Depth Estimation. Since our dataset contains ground truth depth maps we evaluated two recent monocular metric depth estimation methods. The methods we tested without further fine-tuning them on our data.

Single-shot Few-Image Scene Reconstruction. For performing few-image-to-3D reconstruction, we deviate from many of the existing comparisons by targeting an automotive use-case and, hence evaluated the performance of methods on egocentric outward-facing views while supervising resulting novel views with 360° exocentrical spherical views.

Evaluation Method PSNR SSIM LPIPS RMSE (m)
Multi-view SplatFacto [3,4] 24.458 0.806 0.210 -
Multi-view NeRFacto [5] 24.936 0.804 0.227 -
Multi-view K-Planes [1,2] 25.744 0.816 0.239 -
Monocular Depth ZoeDepth [7] - - - 12.352
Monocular Depth Metric3D [6] - - - 7.668
Few-view K-Planes [1,2] 11.356 0.463 0.633 -
Few-view SplatFacto [3,4] 11.607 0.486 0.658 -
Few-view NeRFacto [5] 10.943 0.298 0.791 -
Few-view PixelNeRF [8] 14.500 0.550 0.652 19.235
Few-view SplatterImage [9] 17.791 0.580 0.568 11.049
Few-view 6Img-to-3D [10] 18.682 0.726 0.451 6.232

We welcome submissions. Please reach out to us via email!

Data Generator

Both datasets contained in this paper are generated using our data generator.. The data generator provides an easy-to-use front-end for the CARLA Simulator. With our data generator, one can easily define parameters such as the town, the vehicle's initial position, the weather, the number of traffic participants, the number and kinds of sensors, and their position (both ego and exo-views).

Several different sensor readings are available within our dataset:

Image description
The dataset provides RGB, depth maps, semantic and instance segmentation for each image.
Furthermore, the 3D bounding boxes of each vehicle in the scene is also available

Towns

The generated data come from Towns 1 to 7 and 10HD. Towns 1, 3 to 7, and 10HD are used for training and we left all 100 scenes from Town 2 for testing.

Town Number of Scenes Split Description
Town01 255 Train A small, simple town with a river and several bridges.
Town03 265 Train A larger, urban map with a roundabout and large junctions.
Town04 372 Train A small town embedded in the mountains with a special "figure of 8" infinite highway.
Town05 302 Train Squared-grid town with cross junctions and a bridge. It has multiple lanes per direction. Useful to perform lane changes.
Town06 436 Train Long many lane highways with many highway entrances and exits. It also has a Michigan left.
Town07 116 Train A rural environment with narrow roads, corn, barns and hardly any traffic lights.
Town10 155 Train A downtown urban environment with skyscrapers, residential buildings and an ocean promenade.
Town02 101 Val A small simple town with a mixture of residential and commercial buildings.
Town Input Supervision
Town 1 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 2 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 3 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 4 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 5 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 6 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image
Town 7 input image input image input image supervision image supervision image supervision image
input image input image input image supervision image supervision image supervision image

BibTeX

@misc{kaestingschaefer2024seed4dsyntheticegoexodynamic,
      title={SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark}, 
      author={Marius Kästingschäfer and Théo Gieruc and Sebastian Bernhard and Dylan Campbell and Eldar Insafutdinov and Eyvaz Najafli and Thomas Brox},
      year={2024},
      eprint={2412.00730},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.00730}, 
}

Acknowledgement

We thank Nerfies for the website template.