Listen to this story
NVIDIA Research has developed Neuralangelo, an AI model for 3D reconstruction that utilises neural networks to transform 2D video clips into detailed 3D structures. This innovative model generates lifelike virtual replicas of real-world objects, including buildings, sculptures, and more. Similar to how Michelangelo sculpted intricate visions from blocks of marble, Neuralangelo creates 3D structures with remarkable details and textures. Creative professionals can import these 3D objects into design applications, further editing them for use in various fields such as art, video game development, robotics, and industrial digital twins.
What sets Neuralangelo apart is its ability to accurately translate complex textures from 2D videos to 3D assets, surpassing previous methods. This high fidelity makes it easier for developers and creative professionals to rapidly create usable virtual objects for their projects, utilising footage captured by smartphones. The 3D reconstruction capabilities offered by Neuralangelo will greatly benefit creators by enabling them to recreate the real world in the digital realm. Ming-Yu Liu, senior director of research and co-author of the paper, emphasised that this tool will eventually allow developers to import detailed objects, from small statues to massive buildings, into virtual environments for video games or industrial digital twins.
During a demonstration, NVIDIA researchers showcased Neuralangelo’s ability to recreate a variety of objects, ranging from iconic sculptures like Michelangelo’s David to everyday objects like a flatbed truck. The model also demonstrated its capability to reconstruct building interiors and exteriors, as evidenced by a detailed 3D model of the park at NVIDIA’s Bay Area campus.
Neuralangelo employs instant neural graphics primitives, the technology behind NVIDIA Instant NeRF, to capture fine details that previous AI models struggled with, including repetitive texture patterns, homogenous colors, and strong color variations. The model utilises a 2D video of an object or scene filmed from multiple angles, selecting several frames that provide different viewpoints. This approach mimics an artist considering a subject from various sides to grasp its depth, size, and shape. Neuralangelo’s AI then creates a rough 3D representation of the scene, similar to a sculptor chiseling the shape of the subject. The model optimises the render to enhance details, much like a sculptor carefully hewing stone to mimic fabric texture or human figures. The final output is a high-quality 3D object or large-scale scene suitable for virtual reality applications, digital twins, or robotics development.
Neuralangelo is among the nearly 30 projects by NVIDIA Research that will be presented at the Conference on Computer Vision and Pattern Recognition (CVPR), taking place from June 18-22 in Vancouver. These projects cover a wide range of topics, including pose estimation, 3D reconstruction, and video generation. One notable project, DiffCollage, utilises a diffusion method to create large-scale content such as landscape images, 360-degree panoramas, and looped-motion visuals. By treating smaller images as sections of a larger visual, like pieces of a collage, DiffCollage enables diffusion models to generate cohesive-looking content without being trained on images of the same scale. The technique can also transform text prompts into video sequences, as demonstrated by a pretrained diffusion model capturing human motion.