If one is to list the top 5 challenges of building machine learning models, then the first four would probably contain whinings about lack of clean data.
A machine learning model is as good as the data it is fed. Regardless of being called as black box models or magic boxes, the results depend on how clean the input data is.
Not succumbing to this challenge of clean data, many unsupervised and semi-supervised techniques were developed over the years. However, nothing can be more attractive to an ML engineer than a well-labelled dataset.
A self driving car should be accurate. There is no room for second guessing. A self driving car’s accuracy improves drastically if it has been trained on data that has been annotated with respect to different colors, shapes, sizes, signs and angles.
The question here is where can one get that kind of data?
The answer to this question was given by Alexandr Wang, an MIT dropout who founded Scale.
Scale AI aims at accelerating the development of AI by democratizing access to intelligent data. By leveraging its API for autonomous vehicles and other use cases, companies like Cruise, Alphabet, Voyage, nuTonomy, Embark, DriveAI, Starsky and others leverage Scale to turn raw information into human-labeled training data that dependably powers their AI applications.
Scale uses a combination of high-quality human task work, smart tools, statistical confidence checks and machine learning to consistently return scalable, precise data.
In short, Scale is a billion dollar data labeler and machine learning for masses enabler.
How Scale Got It Right
According to the founder of Scale, the motive for having a data labelling platform can be summarised as follows:
- Safe, accurate and unbiased AI systems depend on large volumes of high quality training data.
- The process to acquire, label, and verify training data is slow, manual and expensive.
- Today’s training data bottleneck limits AI’s impact to a small group of well-funded technology companies.
3/ Remember and amplify your deep belief about why you're doing what you're doing.
I knew that labeled data really mattered for the next 20 years, and I knew I could do it better than how it was done today.
Good ideas are especially good if few people agree. It's really true.
— Alexandr Wang (@alexandr_wang) August 6, 2019
Over the past 3 years Scale AI has literally scaled up its services and now possess a rich and diverse portfolio of customers including Waymo, OpenAI, Airbnb and many others. This success owes a great deal to the domains that were chosen. Here’s to name a few diverse applications:
- Mapping: Using satellite, drone, and street-level imagery for use cases from agriculture to insurance.
- Offline retail: Cashierless checkouts, inventory management systems.
- AR and VR systems used in gaming, real estate or manufacturing.
“Machine learning is likely the most important technological shift of our time, and the overall benefits to the world will be comparable to those of the internet,” wrote Wang on the eve of having successfully securing funds in their latest bid.
To date, Scale AI has raised $122.7 million in funding at a valuation of over $1B. With its latest funding round being a Series C led by Founders Fund and continued investment from Index Ventures, Accel and Y Combinator, the future looks bright for Scale.
The success of Scale, hopefully, will open up more avenues for both consumers and providers alike.
Data Labeling As An Industry
The debate that still keeps people busy is whether AI gives or takes away jobs. The end game of this highly ambitious journey to achieve AGI cannot be foretold but those are in favour of AI can find some comfort in the fact that data labeling as a service has become a thing.
For example, in India, where the AI scene is yet to descend the crib, few startups have already started to offer data labeling services. And, this doesn’t stop there. Few individuals have been running this service remotely. They hire a group of people whose only job is to sit and annotate the images in the dataset. They see an image of a car and they label it as ‘car’, so on and so forth.
This labeled data is then offered to the companies which build machine learning models for various applications that deal with image recognition.
Considering the fact that data gathering and cleaning consumes better part of the time for a typical ML engineer, data annotators have found a niche within the vast world of data driven solutions.
Register for our upcoming events:
- Meetup: NVIDIA RAPIDS GPU-Accelerated Data Analytics & Machine Learning Workshop, 18th Oct, Bangalore
- Join the Grand Finale of Intel Python HackFury2: 21st Oct, Bangalore
- Machine Learning Developers Summit 2020: 22-23rd Jan, Bangalore | 30-31st Jan, Hyderabad