In partnership with City, University of London, Microsoft has introduced the ORBIT dataset for accelerating real-world few-shot learning using teachable object recognition. The benchmark dataset has 3,822 videos of 486 objects recorded by 77 people who are blind or have low vision using their mobile phones. This comes down to a total of 2,687,934 frames.
It has made code for the dataset, computing benchmark metrics, and running baselines available at the ORBIT dataset GitHub page. The ORBIT dataset shows objects in realistic conditions like poorly framed, blurry ones occluded by hands and other objects, different backgrounds, lighting, and object orientations. By bringing this out, the researchers want to help the machine learning community speed up research in few-shot, high variation object recognition and also look for new research methodologies in few-shot video recognition.
Sign up for your weekly dose of what's up in emerging technology.
What are object recognition systems?
It is a field of computer vision for finding and identifying objects in an image or video sequence. In the recent past, object recognition systems developed have shown great improvement, but issues still persist. As per this research paper, most of the systems depend on trained datasets with large-high-quality labelled examples in each category. This makes it an expensive process.
Few-shot learning method
This is where few-shot learning comes into play by training models to recognise novel objects from a few examples only. But most of this research has datasets that do not have high variation in the number of examples per object and the quality of the examples. There is a need for datasets that bring out the high variation that is usually associated with real-world applications.
What does the ORBIT dataset do?
The research adds that an area that captures few-shot, high-variation points are teachable object recognisers (TORs) for people who are blind or have low vision.
- It teaches a system to recognise objects that are important to them by taking a few short videos of them
- After this, the videos are used to train an object recogniser which is personalised
It lets a blind or low vision person teach the object recogniser important objects like, for instance, their house keys or favourite shirt and then recognise them with a phone. This cannot be done by typical object recognisers due to exclusion in common object recognition training datasets.
Image: Microsoft (Images from the ORBIT dataset, showing the high variation embodied in user-submitted videos)
What about the benchmark?
The benchmark on the ORBIT dataset is innovative as well. In this benchmark, performance on the teachable object recognition benchmark is measured based on input from each user, making it different from typical computer vision benchmarks.
The trained model is given the objects and associated videos for a single user. How well it recognises the user’s objects shows the effectiveness of the model. This is done for each user in a set of test users. What follows is a suite of metrics that shows how effectively the teachable object recogniser will work for a single user in a real-world scenario.
Image: Microsoft (Performance on highly cited few-shot learning models is saturated on existing benchmarks)
Making AI More Inclusive
The work for this project was funded by Microsoft AI for Accessibility. Microsoft researchers have been working for quite some time to make ways for AI systems more inclusive of people with disabilities. It is teaming up with AI for Accessibility grantees to bring forward representative training datasets like the ORBIT and the Microsoft Ability Initiative with the University of Texas at Austin researchers.
What does the evaluation show?
Evaluations on highly cited few-shot learning models show good scope for innovation in high-variation, few-shot learning. Few-shot models only achieve 50-55 per cent accuracy on the teachable object recognition benchmark, in spite of saturation of model performance on existing few-shot benchmarks. The high variance between users also comes into the picture.
Challenges in building teachable object recognition systems
- Feedback – provide feedback to users about the data they provided when training in a new personal object. Feedback on the quality of data is needed.
- Working on this feedback will need subtlety in user interaction.
- Engineering challenge – For the adaptation of the models to run on resource-constrained devices like mobile phones.
Technology has advanced by leaps and bounds in the last few years, and making the benefits of technology accessible to everyone is of prime significance. While working with machine learning mechanisms, researchers often do not get access to datasets that include people with disabilities. This creates a hindrance to building intelligent solutions for people with disabilities or even affect decision making, as a large set of people get excluded from such datasets. Such steps by big tech names can help bridge the gap and make AI and its benefits accessible to everyone.