Active Hackathon

ML Democratisation: What Can We learn From Scale AI’s Rise To Prominence

If one is to list the top 5 challenges of building machine learning models, then the first four would probably contain whinings about lack of clean data.

A machine learning model is as good as the data it is fed. Regardless of being called as black box models or magic boxes, the results depend on how clean the input data is. 


Sign up for your weekly dose of what's up in emerging technology.

Not succumbing to this challenge of clean data, many unsupervised and semi-supervised techniques were developed over the years.  However, nothing can be more attractive to an ML engineer than a well-labelled dataset.

A self driving car should be accurate. There is no room for second guessing. A self driving car’s accuracy improves drastically if it has been trained on data that has been annotated with respect to different colors, shapes, sizes, signs and angles.

The question here is where can one get that kind of data? 

The answer to this question was given by Alexandr Wang, an MIT dropout who founded Scale. 

Scale AI aims at accelerating the development of AI by democratizing access to intelligent data. By leveraging its API for autonomous vehicles and other use cases, companies like Cruise, Alphabet, Voyage, nuTonomy, Embark, DriveAI, Starsky and others leverage Scale to turn raw information into human-labeled training data that dependably powers their AI applications.

Scale uses a combination of high-quality human task work, smart tools, statistical confidence checks and machine learning to consistently return scalable, precise data.

In short, Scale is a billion dollar data labeler and machine learning for masses enabler.

How Scale Got It Right

According to the founder of Scale, the motive for having a data labelling platform can be summarised as follows:

  • Safe, accurate and unbiased AI systems depend on large volumes of high quality training data.
  • The process to acquire, label, and verify training data is slow, manual and expensive.
  • Today’s training data bottleneck limits AI’s impact to a small group of well-funded technology companies.


Over the past 3 years Scale AI has literally scaled up its services and now possess a rich and diverse portfolio of customers including Waymo, OpenAI, Airbnb and many others. This success owes a great deal to the domains that were chosen. Here’s to name a few diverse applications:

  • Mapping: Using satellite, drone, and street-level imagery for use cases from agriculture to insurance. 
  • Offline retail: Cashierless checkouts, inventory management systems.
  • AR and VR systems used in gaming, real estate or manufacturing.

“Machine learning is likely the most important technological shift of our time, and the overall benefits to the world will be comparable to those of the internet,” wrote Wang on the eve of having successfully securing funds in their latest bid.

To date, Scale AI has raised $122.7 million in funding at a valuation of over $1B. With its latest funding round being a Series C led by Founders Fund and continued investment from Index Ventures, Accel and Y Combinator, the future looks bright for Scale.

The success of Scale, hopefully, will open up more avenues for both consumers and providers alike.

Data Labeling As An Industry

The debate that still keeps people busy is whether AI gives or takes away jobs. The end game of this highly ambitious journey to achieve AGI cannot be foretold but those are in favour of AI can find some comfort in the fact that data labeling as a service has become a thing.

For example, in India, where the AI scene is yet to descend the crib, few startups have already started to offer data labeling services. And, this doesn’t stop there. Few individuals have been running this service remotely. They hire a group of people whose only job is to sit and annotate the images in the dataset. They see an image of a car and they label it as ‘car’, so on and so forth. 

This labeled data is then offered to the companies which build machine learning models for various applications that deal with image recognition.

Considering the fact that data gathering and cleaning consumes better part of the time for a typical ML engineer, data annotators have found a niche within the vast world of data driven solutions. 

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

A Case for IT Professionals Switching Jobs Frequently

For Indian companies, the ability to retain employees has become a tight ropewalk between transforming their working models and adopting a hybrid working model successfully. Over 60% respondents in the Qualtrics survey said that they would look for a new job, if forced to return to work from office full time.