Top Open-Source Datasets For Object Detection In 2021

One of the challenging topics in the domain of computer vision, object detection, helps machines understand and identify real-time objects with the help of digital images as inputs. Here, we have listed the top open-source datasets one can use for object detection projects.

(The list is in no particular order)

1| MS Coco

COCO is a large-scale object detection dataset that addresses three core research problems in scene understanding: detecting non-iconic views (or non-canonical perspectives) of objects, contextual reasoning between objects, and precise 2D localisation of objects. The dataset has several features, such as object segmentation, recognition in context, superpixel stuff segmentation, 1.5 million object instances, 80 object categories and more.  


Sign up for your weekly dose of what's up in emerging technology.

Know more here.

2| Exclusively Dark (ExDark) Image Dataset

The Exclusively Dark (ExDARK) is a singular low-light image dataset that provides a staple collection of images for benchmarking low-light research works and bring together different areas of expertise to focus on low-light conditions, for instance, image understanding, image enhancement, object detection, etc. The dataset is a collection of 7,363 low-light images from very low-light environments to twilight (i.e 10 different conditions) with 12 object classes (similar to PASCAL VOC) annotated on both image class level and local object bounding boxes. 

Download our Mobile App

Know more here.


The 20BN-SOMETHING-SOMETHING is a large scale dataset. The dataset is a collection of labelled video clips that show humans performing pre-defined basic actions with various objects. 20BN-SOMETHING-SOMETHING allows machine learning models to develop a granular understanding of basic actions in the day-to-day physical world.

Know more here.

4| CIFAR-10

CIFAR-10 is a large dataset that consists of 60,000 colour images in 10 different classes. The dataset includes 10,000 test images and 50,000 training images divided into five training batches.

Know more here.

5| LISA Traffic Sign Detection Dataset

LISA or Laboratory for Intelligent & Safe Automobiles Traffic Sign Dataset is a set of annotated frames and videos that contains US traffic signs. The dataset contains images obtained from different cameras, 47 US sign types, and 7855 annotations on 6610 frames. LISA is released in two stages, i.e. one with pictures and one with both videos and pictures.

Know more here.

6| Open Images

Open Images is a dataset of around 9 million images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localised narratives. The dataset contains 16 million bounding boxes for 600 object classes on 1.9 million images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. Open Images also offers visual relationship annotations, indicating pairs of objects in particular relations,  object properties and human actions.

Know more here.


BDD100K is a driving dataset for heterogeneous multitask learning. The dataset includes ten tasks and 100K videos to evaluate the progress of image recognition algorithms on autonomous driving. The tasks on this dataset include multi-object segmentation tracking, image tagging, road object detection, semantic segmentation, lane detection, drivable area segmentation, instance segmentation, multi-object detection tracking, domain adaptation, and imitation learning.

Know more here.

8| ImageNet

ImageNet is an image dataset organised according to the WordNet hierarchy. In this dataset,  each node of the hierarchy is depicted by hundreds and thousands of images. The dataset resulted from two crucial needs in computer vision research. The first was the need to establish a North Star problem in computer vision. Second, there was a critical need for more data to enable more generalisable machine learning methods.

Know more here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

RIP Google Stadia: What went wrong?

Google has “deprioritised” the Stadia game streaming platform and wants to offer its Stadia technology to select partners in a new service called “Google Stream”.