AUTHORS: Giulio Federico, Fabio Carrara, Giuseppe Amato, Marco Di Benedetto
WORK PACKAGE: WP 10 – Retina
URL: Pageflex Server [document: D-VTT-2555FF3A_00001]
Keywords: Generative AI, Computer Graphics, Denoising Diffusion Probabilistic Model, Gaussian
Splatting, NeRF, Signed Distance Field, Video Reconstruction, Deep Learning, Machine
Learning, Artificial Intelligence, Text-to-3D, Image-to-3D, Urban Environment, Score
Distillation Sampling
Abstract
The reconstruction of large-scale real outdoor environments is crucial for promoting the adoption
of Extended Reality (XR) in industrial and entertainment sectors. This task often requires significant
resources such as depth cameras, LiDAR sensors, drones, and others, alongside traditional data
processing pipelines like Structure-from-Motion (SfM), which demand extensive computational
resources, thus preventing real-time processing. Additional constraints arise from the limited
accessibility to the aforementioned resources. While 3D laser scanners (e.g., LiDAR) are precise and fast,
they are expensive, often bulky
especially the high-quality models
and their effectiveness is
contingent on the type of environment being scanned. Depth sensors offer a more affordable and
compact alternative; however, due to their limited range, they are ideal only for indoor settings.
Photogrammetry, while capable of producing high-quality results at a lower cost, can be time
consuming and computationally intensive. It also suffers from limited accuracy, strong dependence on
lighting conditions, and the need for numerous photos from various angles that can be not always easily
accessible. (…)