DeepLearning.AI’s Andrew Ng has announced the winners for the Data-Centric AI competition. The winners in terms of best overall performance are:
Divakar Roy, Shashank Despande, Chris Anderson and Rob Walsh of Innotescus, and Asfandyar Azhar and Nidhish Shah of Synaptic AnN. Under the category of Most Innovative, the winners included Mohammad Motamedi, Johnson Kuan and the GoDataDriven group | Part of Xebia.
Image Source: Andrew Ng | Twitter
Rules of the competition
- Participants were given ~3K images of handwritten Roman numerals from 1–10. The task was to optimise the model performance in classifying Roman numerals.
- A label book of 52 images to use as a small test set for the participants’ own experiments. This label book is not used in the final evaluation.
- The model architecture is held fixed (cut off ResNet50) and trained for 100 epochs while the model weights are selected among the epochs based on accuracy on the validation set.
- Though the model and training procedure have been kept fixed, the participant’s were free to improve the dataset and change the training and validation data splits.
- Addition of images was also allowed, but submissions must have less than 10K images combined in the training and validation splits.
- Upon submission of the improved dataset, participants were evaluated against a hidden test set of images.
- Maximum five submissions per week (65 total over the course of the competition) were allowed.
As only less than 10K images were allowed, participants had to focus on getting “Good Data” in the absence of “Big Data”. Andrew NG feels that this phenomenon is very common in AI applications for more traditional industries.
Best Overall Performance
- Divakar Roy, a software engineer working at Findmeashoe.com based out of Bengaluru.
His interests lie in software development and testing, problem-solving, image and video processing, 3D visualisation & measurement and code porting between C, CUDA, Python and MATLAB.
Roy called this win a highlight of his professional career. He posted
Image Source: Divakar Roy | LinkedIn
- Shashank Despande, Chris Anderson and Rob Walsh of Innotescus (data visualisation and image+video annotation platform).
The company said it aims to enable customers to deploy the most reliable, unbiased computer vision models faster by demystifying the preparation and analysis of the most challenging machine learning datasets.
- Asfandyar Azhar and Nidhish Shah of Synaptic AnN.
Shah is a student of computer science at the Eindhoven University of Technology, and Azhar is pursuing a combined Bsc/Msc (Honors) course in Data Science and AI at the same university.
Shah said that he learned a lot about the most innovative approaches to data-centric AI and how to democratise them. He posted on Linkedin:
Image Source: Nidhish Shah|LinkedIn
- Mohammad Motamedi, who works as a senior software engineer – deep learning and AI technology at NVIDIA.
- Johnson Kuan, who is the Director, data science and AI/ML at DIRECTV. His role is to lead the implementation of MLOps to accelerate the development/deployment of ML models. He also helps to drive the enablement and adoption of the latest, most impactful AI/ML techniques.
- GoDataDriven| Part of Xebia – offers data and AI services, consultancy and training for the top 200 companies in the Netherlands and abroad.
What exactly is Data-Centric AI?
Data-centric AI aims to focus on the quality of data used to train a model rather than improving the algorithm development. This is exactly the opposite of model-centric data, whose aim is to collect all the data and build a model capable enough to deal with the noise within the data. Andrew Ng has argued that often this is much more effective to improve performance.
Myths about Data-centric AI
As an emerging field, there is a lot of confusion and myths related to data-centric AI. Andrew Ng points out some of them:
- Data-centric AI doesn’t address the critical problem of building responsible AI
- Just another name for applied machine learning
- Paying more attention to data
- Better processing of data
- Only about labelling
- Only works for unstructured data
The top three winners from each of the two categories (Best Performance Overall and Most Innovative) will be invited to a private event with Andrew Ng to share ideas about how to grow the data-centric movement.