Listen to this story
OpenAI is here with yet another project. This time it’s Shap-E, a conditional generative model for 3D assets. The paper reads that unlike other 3D generative models that produce a single output representation, Shape-E can directly generate the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields (NeRF) with single text prompts.
Among the few open-source offerings by OpenAI, Shap-E is open-source and available with the model weights, inference code, and sample on GitHub.
You can find the GitHub repository for Shape-E here.
According to the paper, Shap-E is trained in two stages. Firstly, an encoder is trained, which maps 3D assets deterministically into the parameters of an implicit function. Secondly, a conditional diffusion model is trained on the encoder’s outputs.
“Our models can generate complex and diverse 3D assets in just a few seconds when trained on a large dataset of paired 3D and text data,” reads the paper.
The interesting part about OpenAI’s Shap-E is that despite modeling a higher-dimensional, multi-representation output space, Shap·E converges faster and produces comparable or better sample quality than Point·E.
Though the created 3D objects might look pixelated and rough, the models can be generated with a single text. Another limitation that is included with this is currently, it is only capable of producing objects with single object prompts and simple attributes, and struggles to find multiple attributes, as pointed out in the paper.
Recently, OpenAI had also released Point-E, which was touted as a 3D DALL-E 2. The same diffusion technique used in DALL-E and Point-E is also leveraged in Shape-E. But this time, instead of point cloud diffusion with Point-E, now users can generate NeRF capable textured meshes.