Researchers Klaus Greff, Google AI researcher Francois Belletti, Google Scholar Lucas Beyer and several others have released a paper around a scalable dataset generator, Kubric. It is an open-source Python framework that uses both PyBullet and Blender to create images that are high quality. PyBullet is used to train the model to physically simulate interaction with other objects, while Blender is used for rendering the images. The tool was built to decrease costs and resources associated with generating mature and unbiased real data.
The research paper demonstrated how effective Kubric was by using a series of 13 separate datasets that were generated for tasks involved in unsupervised multi-object video detection. The datasets were for several tasks ranging from 3D NeRF models to optical flow estimation. Kubric released photo-realistic scenes that are heavily annotated and can be easily scaled for bigger tasks that are performed by thousands of machines. The tool is able to generate huge volumes of such synthetic data.
Even with the urgent need for cheaper, well-annotated and unbiased data, there is a lack of software tools that generate effective, usable data. Synthetic data has become more preferable in the recent past because of its many advantages – cheaper costs, rich annotations, giving researchers full control over their data and avoiding risks associated with licensing and privacy.