SEED4D is a large-scale synthetic multi-view dynamic 4D driving dataset, data generator and benchmarks. Models for egocentric 3D and 4D reconstruction, including few-shot interpolation and extrapolation settings, can benefit from having images from exocentric viewpoints as supervision signals. No existing dataset provides the necessary mixture of complex, dynamic, and multi-view data. To facilitate the development of 3D and 4D reconstruction methods in the autonomous driving context, we propose a Synthetic Ego--Exo Dynamic 4D (SEED4D) dataset. SEED4D encompasses two large-scale multi-view synthetic urban scene datasets. Our static (3D) dataset encompasses 212K inward- and outward-facing vehicle images from 2K scenes, while our dynamic (4D) dataset contains 16.8M images from 10K trajectories, each sampled at 100 points in time with egocentric images, exocentric images, and LiDAR data. We additionally present a customizable, easy-to-use data generator for spatio-temporal multi-view data creation. Our open-source data generator allows the creation of synthetic data for camera setups commonly used in the NuScenes, KITTI360, and Waymo datasets.
Static Ego-Exo Dataset. We introduce a novel dataset for few-view image reconstruction tasks in an autonomous driving setting. Our dataset contains 2002 single-timestep complex outdoor driving scenes, each offering six plus one outward-facing vehicle images and 100 images from exocentric viewpoints on a bounding sphere for supervision. Only a single vehicle in the scene is equipped with this setup. We define ego views to be 900 x 1600 to match the NuScenes dataset and the surround vehicle exo views to be 600 x 800.
Dynamic Ego-Exo Dataset. Our temporal dataset consists of 10.5K driving trajectories well-suited for 4D forecasting, 4D reconstruction, or video prediction tasks. Each trajectory is 100 steps long, corresponding to a driving length of 10 seconds. The 10.5k trajectories come from a total of 498 scenes across all towns. In each scene, the number of vehicles is set to $21$, all equipped with six plus one outward-facing vehicle camera and ten inward-facing surround vehicle exocentric images. The ego views have size 128 x 256 and the exo views are set to 98 x 128.
The static and the dynamic ego--exo view datasets are visualized in the Figure below. They differ mainly in image resolution and trajectory length and have complementary strengths. The static ego--exo dataset contains 12k egocentric views and 200k exocentric views. The dynamic ego--exo dataset contains 6.3M egocentric views and 10.5M exocentric views.
Multi-view Novel View Synthesis. We evaluate how well existing methods can reconstruct the scene given many spherical views. We divide the 100 spherical views into training and test data using an 80/20 split.
Monocular Metric Depth Estimation. Since our dataset contains ground truth depth maps we evaluated two recent monocular metric depth estimation methods. The methods we tested without further fine-tuning them on our data.
Single-shot Few-Image Scene Reconstruction. For performing few-image-to-3D reconstruction, we deviate from many of the existing comparisons by targeting an automotive use-case and, hence evaluated the performance of methods on egocentric outward-facing views while supervising resulting novel views with 360° exocentrical spherical views.
Evaluation | Method | PSNR ↑ | SSIM ↑ | LPIPS ↓ | RMSE (m) ↓ |
---|---|---|---|---|---|
Multi-view | SplatFacto [3,4] | 24.458 | 0.806 | 0.210 | - |
Multi-view | NeRFacto [5] | 24.936 | 0.804 | 0.227 | - |
Multi-view | K-Planes [1,2] | 25.744 | 0.816 | 0.239 | - |
Monocular Depth | ZoeDepth [7] | - | - | - | 12.352 |
Monocular Depth | Metric3D [6] | - | - | - | 7.668 |
Few-view | K-Planes [1,2] | 11.356 | 0.463 | 0.633 | - |
Few-view | SplatFacto [3,4] | 11.607 | 0.486 | 0.658 | - |
Few-view | NeRFacto [5] | 10.943 | 0.298 | 0.791 | - |
Few-view | PixelNeRF [8] | 14.500 | 0.550 | 0.652 | 19.235 |
Few-view | SplatterImage [9] | 17.791 | 0.580 | 0.568 | 11.049 |
Few-view | 6Img-to-3D [10] | 18.682 | 0.726 | 0.451 | 6.232 |
We welcome submissions. Please reach out to us via email!
Both datasets contained in this paper are generated using our data generator.. The data generator provides an easy-to-use front-end for the CARLA Simulator. With our data generator, one can easily define parameters such as the town, the vehicle's initial position, the weather, the number of traffic participants, the number and kinds of sensors, and their position (both ego and exo-views).
Several different sensor readings are available within our dataset:
The generated data come from Towns 1 to 7 and 10HD. Towns 1, 3 to 7, and 10HD are used for training and we left all 100 scenes from Town 2 for testing.
Town | Number of Scenes | Split | Description |
---|---|---|---|
Town01 | 255 | Train | A small, simple town with a river and several bridges. |
Town03 | 265 | Train | A larger, urban map with a roundabout and large junctions. |
Town04 | 372 | Train | A small town embedded in the mountains with a special "figure of 8" infinite highway. |
Town05 | 302 | Train | Squared-grid town with cross junctions and a bridge. It has multiple lanes per direction. Useful to perform lane changes. |
Town06 | 436 | Train | Long many lane highways with many highway entrances and exits. It also has a Michigan left. |
Town07 | 116 | Train | A rural environment with narrow roads, corn, barns and hardly any traffic lights. |
Town10 | 155 | Train | A downtown urban environment with skyscrapers, residential buildings and an ocean promenade. |
Town02 | 101 | Val | A small simple town with a mixture of residential and commercial buildings. |
Town | Input | Supervision | ||||
---|---|---|---|---|---|---|
Town 1 | ||||||
Town 2 | ||||||
Town 3 | ||||||
Town 4 | ||||||
Town 5 | ||||||
Town 6 | ||||||
Town 7 | ||||||
@misc{kaestingschaefer2024seed4dsyntheticegoexodynamic,
title={SEED4D: A Synthetic Ego--Exo Dynamic 4D Data Generator, Driving Dataset and Benchmark},
author={Marius Kästingschäfer and Théo Gieruc and Sebastian Bernhard and Dylan Campbell and Eldar Insafutdinov and Eyvaz Najafli and Thomas Brox},
year={2024},
eprint={2412.00730},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.00730},
}