X

AI Technology Makes It Possible to 3D Scan and Edit Real-World Objects

Imagine being able to view a realistic, fully editable 3D model of an object from any angle by simply sweeping your smartphone around it. AI advancements are making this a reality very quickly.

For precisely this purpose, Canadian researchers at Simon Fraser University (SFU) have unveiled new AI technology. Consumers will soon be able to take 3D captures of real-life objects instead of just 2D ones, and edit their shapes and appearance as they like, with the same ease as they can do with regular 2D photos.

Researchers presented Proximity Attention Point Rendering (PAPR), a novel method that converts a collection of 2D images of an object into a cloud of 3D points that depicts the object’s shape and appearance, at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in New Orleans, Louisiana. The paper was published on the arXiv preprint server.

After that, each point has a knob that allows users to manipulate the object: dragging a point modifies its shape, and editing its properties alters its appearance. Afterwards, through a procedure called “rendering,” the 3D point cloud can be viewed from any perspective and converted into a 2D image that accurately depicts the edited object from the perspective from which the image was taken.

Researchers demonstrated how to bring a statue to life using the new AI technology. The technology automatically turned a collection of images of the statue into a 3D point cloud, which is subsequently animated. The final product is a video that shows the statue moving its head side to side while a path is shown around it.

A paradigm shift in the reconstruction of 3D objects from 2D images is primarily being driven by AI and machine learning. According to Dr. Ke Li, senior author of the paper and assistant professor of computer science at Simon Fraser University (SFU), “The outstanding success of machine learning in fields like computer vision and natural language is inspiring researchers to investigate how traditional 3D graphics pipelines can be re-engineered with the same deep learning-based building blocks that were responsible for the recent wave of AI success stories.”

“It turns out that doing so successfully is a lot harder than we anticipated and requires overcoming several technical challenges. What excites me the most is the many possibilities this brings for consumer technology—3D may become as common a medium for visual communication and expression as 2D is today.”

Creating a 3D representation of shapes that is easy and intuitive for users to edit is one of the main challenges in 3D modeling. Neural radiance fields (NeRFs) are one prior method that requires the user to describe what happens to each continuous coordinate, making shape editing difficult. The shape surface may be crushed or shattered after editing, which makes a more modern method called 3D Gaussian splatting (3DGS) unsuitable for shape editing as well.

One of the researchers’ most important insights was realizing that every 3D point in the point cloud could be thought of as a control point in a continuous interpolator rather than as a discrete splat. Subsequently, the shape automatically and intuitively changes when the point is moved. In animated videos, animators use a similar method to define object motion: they specify an object’s position at a few points in time, and an interpolator automatically generates the object’s motion at every point in time.

Nevertheless, it is not easy to define an interpolator mathematically between any random set of 3D points. The scientists developed a machine learning model that uses proximity attention, a novel mechanism, to learn the interpolator in an end-to-end manner.

A spotlight at the NeurIPS conference, an honor given to the top 3.6% of paper submissions, was given to the paper in appreciation of this technological advancement.

The research team is looking forward to the future with excitement. “This opens the way to many applications beyond what we’ve demonstrated,” Dr. Li added. “We are already exploring various ways to leverage PAPR to model moving 3D scenes and the results so far are incredibly promising.”

Ke Li, Yanshu Zhang, Shichong Peng, and Alireza Moazeni are the paper’s authors. Zhang, Peng, and Moazeni are Ph.D. candidates in the School of Computing Science at Simon Fraser University (SFU), and Zhang and Peng are co-first authors.

Categories: Technology
Kajal Chavan:
X

Headline

You can control the ways in which we improve and personalize your experience. Please choose whether you wish to allow the following:

Privacy Settings

All rights received