From microelectronics to mechanics and machine learning, the modern-day robots are a marvel of multiple engineering disciplines. They use sensors, image processing and reinforcement learning algorithms to move the objects around and move around the obstacles as well.
However, this is not the case when it comes to handling objects such as glass. The surface properties of glass are transparent, and non-uniform light reflection makes it difficult for the sensors mounted on the robot to understand how to engage in a simple pick and place operation.
To address this problem, researchers at Google AI along with Synthesis AI and Columbia University devised a novel machine-learning algorithm called ClearGrasp, that is capable of estimating accurate 3D data of transparent objects from RGB-D images.
Sign up for your weekly dose of what's up in emerging technology.
Google claims that the model identifies transparent objects quite well even if the object is situated in a patterned background or differentiating between transparent objects partially occluding one another.
Overview Of ClearGrasp
ClearGrasp uses 3 neural networks:
- A network to estimate surface normals
- Network for occlusion boundaries (depth discontinuities)
- And a network that masks transparent objects
This mask is used to remove all pixels belonging to transparent objects, and then fill the depths correctly.
A global optimisation module then starts extending the depth from known surfaces to guide the shape of the reconstruction, and the predicted occlusion boundaries to maintain the separation between distinct objects.
Transparent objects can confuse sensors because optical 3D sensors are driven by algorithms that assume all surfaces reflect light evenly in all directions.
Hence, most of the depth data from transparent objects are invalid or contain unpredictable noise.
While the distorted view of the background seen through transparent objects is confusing, there are still clues about the objects’ shape.
Transparent surfaces exhibit mirror-like reflections that show up as bright spots in a well-lit environment. Convolutional neural networks (CNN) can use these reflections to infer accurate surface normals, which then can be used for depth estimation.
Google AI’s ClearGrasp can work with inputs from any standard RGB-D camera, using deep learning to accurately reconstruct the depth of transparent objects and generalise to completely new objects unseen during training.
The model was trained on a large-scale synthetic dataset that contains more than 50,000 photorealistic renders representing the surface curvature, edges, and depth, useful for training a variety of 2D and 3D detection tasks.
To check the qualitative performance of ClearGrasp, 3D point clouds were constructed from the input and output depth images, which can be crucial for applications, such as 3D mapping and 3D object detection.
Previous methods required prior knowledge of the transparent objects along with maps of background lighting and camera positions.
Better sensing of transparent surfaces would not only improve safety but could also open up a range of new interactions in unstructured applications.
The modern-day robotics can deliver your amazon package over the air, throw a ball down the hoop, arrange bottles in the pegs and can even meticulously move around to make the intricate of paintings. And, now with ClearGrasp, they can generate AR visualisations on glass tabletops and even be seen serving wine soon!