“Andrew Ng will soon be launching a campaign; a competition to push for data-centric models.”
A lot of people joke about how 80 percent of machine learning is simply data cleaning. Additionally, many people look at machine learning as a glorified, technical version of statistics—a field that places a great deal of importance on data. If anything, this tells us one thing for sure: Data is critical. Even a well known face in the ML community, Andrew Ng has stressed how ML needs to take a more data-centric stance rather than a model-centric one.
Nearly 90 percent of ML models built globally are never brought to light, primarily because they cannot adjust to the variety of information available in real-world applications. In a 2020 survey, only 22 percent of companies had made use of their models, many of which took as long as 12 months to bring to users. Traditional software is backed by code while both code and data enable AI systems. However, many software developers still work on codes and model architectures rather than data when they find their ML models in a bit of a fix.
Earlier this year, Andrew Ng brought attention to MLOps, which deals with utilising machine learning models in production systems. Andrew Ng believes that focusing on data here, instead of only working on improving one’s code, could unlock multitudes of new multimillion-dollar applications of artificial intelligence. He claims that current architectures are highly evolved for identifying photographs, recognising speech or generating text. Tinkering with their architecture is perhaps not the best method to enable them to perform better anymore.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Ushering next gen AI
The solution Andrew Ng has proposed is to put aside the architecture of an AI model and focus on what it is working with, i.e. the data. By paying close attention to what a model learns and improving the quality of data, and subsequently retraining the ML model, engineers can build higher quality systems in a much shorter time.
Andrew Ng will be launching a campaign to explain this viewpoint on June 17th 2021. The campaign will jump-start with Landing AI’s (a company founded by Ng to increase the use of AI in traditional industries) competition—which will comprise contestants competing to attain the best performance by amending data in an otherwise fixed model. The competition will end on September 4th—which just so happens to coincide with John McCarthy’s birthday (he came up with the term artificial intelligence)—where the top three winners will be invited to a private roundtable event with Andrew Ng, himself, and have opportunities to discuss their ideas and thoughts with everyone present.
Download our Mobile App
Andrew Ng says that he hopes the competition will change the decades of model-centric tradition held by developers. Despite this model-centric approach, a lot of research backs Ng’s data-centric viewpoint. A Cambridge study reported that the most critical but often overlooked aspect in ML models is data dispersion. Smaller datasets have to deal with noisier data, while larger ones make it more difficult to label them. This makes for significant bottlenecks when deploying ML solutions into the real world.
Keeping this in mind, Ng says that the shift to data-driven practices will help solve various challenges that AI currently faces, including learning how to perform a task from tens of thousands of data points (instead of the current millions!), learning to understand when humans do not agree (e.g. when different medical experts don’t agree to a diagnosis), picking up inconsistency among data sources, changes in data over time due to something like changes in behaviour, and creating useful synthetic data when actual data is not abundantly available.
Bringing this massive paradigm shift in how AI is built will not be easy. Andrew Ng feels that it will require as much research and development as the shift from ‘old fashioned AI to deep learning’ has in the recent decades. Andrew Ng’s DeepLearning.AI, is initiating a course to teach this data-centric approach on easy-to-reach platforms like Coursera (interestingly, also founded by Andrew Ng). He has also given various presentations on DeepLearning.AI’s YouTube channel and Amazon Web Service’s Machine Learning Summit. Andrew Ng believes that the right people can put this idea to use constructively to counter many issues, such as manufacturing, treating diseases, energy consumption and food production, all with the help of AI-backed with the appropriate data.