Unity Launches Synthetic Image Datasets To Train AI Models Faster

Unity came up with synthetic image datasets to simplify data collection and protect the confidentiality of data.


San Francisco-based videogame software development company Unity Technologies recently announced the launch of synthetic image datasets to help develop computer vision applications (train artificial intelligence (AI) models) faster and reduce cost significantly.

Unity — a cross-platform game engine — is widely used by game developers worldwide to create interactive games, virtual reality and augmented reality applications. With its recent announcement, the company looks to leverage its datasets to build AI models across industry verticals, including manufacturing, retail and security. 


Sign up for your weekly dose of what's up in emerging technology.

Tackling the privacy problem 

Unity believes most real-life data collection techniques are labour intensive, expensive, and carry greater privacy risks. The firm came up with synthetic image datasets to simplify data collection and protect the confidentiality of data.

As real data contains sensitive information, most programmers, software developers or researchers may not want them to be disclosed. Synthetic data, however, holds no private information and can not be traced back to the source.  

Most importantly, synthetic data addresses confidentiality and privacy concerns to a large extent and eliminates the privacy issues arising from using images of real people and places.

For example, in autonomous or self-driving cars, the collection of real data is unreasonably expensive. Waymo, the self-driving vertical of Alphabet, has spent close to $3.5 billion in testing Chrysler Pacificas in Silicon Valley and Phoenix. Over the past few years, around 30 self-driving car companies have spent close to $16 billion on developing fully self-driving cars.

At present, the collection of synthetic data is expensive and time-consuming. An AI model must initially create it, and many companies lack the resources to do so. To that end, Unity has come up with a Unity Perception SDK and a library with labelling and randomisation tools for developers.

Eliminates data bias     

Unity said the usage of synthetic data eliminates the problem of ‘biased data,’ which often results in skewed outcomes, lower accuracy levels and analytics errors. 

“Data captured from the real world is often biased towards what is easy to collect, is subject to human labelling errors, and needs to be refreshed often, which can be very expensive,” said Danny Lange, SVP of AI at Unity, comparing it with synthetic datasets. 

Lange said the best AI results are achieved with a large amount of high-quality synthetic data combined with a small amount of real data, when possible. The synthetic version of datasets validates privacy rules and accurately reflects real world-data, he added. 

Further, he said these datasets empower companies to simulate scenarios that might occur in the real world shortly based on a sizable increase in user data. “As a result, we see smarter indoor environments, such as cashier-less grocery stores, and more as our customers discover new applications,” said Lange. 

How does it work

The new Unity computer vision datasets are based on synthetic data, which is generated using algorithms. Conversely, synthetic data is used for computer vision applications, particularly in the area of object detection. In the case of Unity, the artificial environment is most likely to be used in creating a 3D model of the object and learning to navigate environments by visual information.  

Unity engine creates ‘digital twins’ of objects (3D models) using photogrammetry techniques. Digital twins are a virtual replica of the physical things, mainly used to run simulations (testbed) before the actual deployment of the solution. The digital twins are then placed in various 3D environments or randomisers, with multiple lighting conditions, textures, camera positions, scale factors, and other parameters.

Source: Unity [Showcasing the visual examples of image labelling] 

Unity recommends various environments best suited to address the customer’s computer vision problem and look for the most suitable dataset. Currently, the Unity team provides the necessary handholding required for its customers. Soon, the company plans to offer a simple self-service interface to generate additional features at their convenience. 

Unity offers a tiered pricing model. The price per image falls proportionally with the increased need for synthetic images/data.

Lange believes synthetic computer vision datasets can support a wide range of AI training use cases, ranging from object detection to improving the performance of AI models.

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM