TensorFlow has introduced Depth API and ARPortraitDepth which estimate a depth map for a single portrait image. It has also presented a computational photography application, 3D photo, which uses the predicted depth and enables a 3D parallax effect on the given portrait image. Tensorflow has launched a live demo for enthusiasts to try and convert their photographs into 3D versions.
Mechanism
TensorFlow explains that ARPortraitDepth takes a single colour portrait image as the input and produces a depth map. The encoder gradually downscales the image or feature map resolution by half, and the decoder increases the feature resolution to the same as the input.
Image: TensorFlow
It adds, “Deep learning features from the encoder are concatenated to the corresponding layers with the same spatial resolution in the decoders to bring high-resolution signals for depth estimation. During training, we force the decoder to produce depth predictions with increasing resolutions at each layer and add a loss for each of them with the ground truth. This empirically helps the decoder to predict accurate depth by gradually adding details.”
To improve the robustness against background variation, it runs an off-the-shelf body segmentation model with MediaPipe and TensorFlow.js before sending the image into the neural network of depth estimation.
TensorFlow added that for the 3D photo application, it created a high-performance rendering pipeline. Firstly, it generates a segmented mask using the TensorFlow.js existing body segmentation API. After that, it passes the masked portrait into the Portrait Depth API and obtains a depth map on the GPU. Then it generates a depth mesh in three.js, with vertices arranged in a regular grid and displaced by re-projecting corresponding depth values.
Image: TensorFlow
Then it applies texture projection to the depth mesh and rotates the camera around the z-axis in a circle.