How to query data for Computer Vision tasks using VisionKG?

In this article, we will discuss the VisionKG in detail and will see how it can query the dataset like COCO and ImageNet.

We mostly know SQL as a query language that facilitates us to query relational data from almost any database, but when it comes to gathering data for computer vision-related tasks, we have to depend on a distinct host of those data sets such as ImageNet, COCO, etc. Recently researchers have proposed a framework called VisionKG, which can integrate those datasets seamlessly. So in this article, we will discuss the VisionKG in detail and will see how it can query the dataset like COCO and ImageNet. Below are the major points to be discussed in this article.

Table of contents

  1. The general idea of VisionKG
  2. How it queries the data?
  3. Practical example

Let’s first discuss what this framework brings out. 

The general idea of VisionKG

It is a unified dataset framework in data-centric AI. Not only are diverse datasets from one AI domain integrated and linked together within this framework, but so are datasets from multiple AI areas. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Existing resources, such as ConceptNet and Wikidata, have a similar purpose in that they integrate data from various sources and make it public, but they instead focus on certain application domains, and none of them is linked to databases in other areas, such as computer vision. Scene graphs, on the other hand, were introduced in the computer vision fields to model the relationship between identified items in photos.

They lack cross-domain compatibility, however, and cannot be queried using common query languages. As a result, current resources should be streamlined, and new resources should be easily incorporated. It is quite advantageous, for example, it can aid in the prevention of distribution shifts and the development of more robust models for training and testing. 

Furthermore, the transition from model tinkering to a deep knowledge of data necessitates that datasets be better organized. Furthermore, the term “data” should be expanded by this approach to include not only training data but also abstract information, such as commonsense or causal relationships.

In a nutshell, it is a single framework for various datasets that allows them to be readily merged and queried, for example, using standard query languages.

How it queries the data?

To realize the concepts stated above, researchers created VisionKG, a unified knowledge graph for CV datasets (e.g, COCO). VisionKG is a knowledge graph based on the Resource Description Framework (RDF) that contains RDF statements describing the metadata of pictures and the semantics of their annotations. 

The World Wide Web Consortium (W3C) recommends RDF as a standardized data model for semantic data integration and as a formal representation for shared human-machine understanding. As a result, RDF may be used to represent numerous semantic structures of prominent label taxonomies like Wordnet, ConceptNet, and Freebase, which are utilized in a variety of CV datasets including Imagenet and OpenImage.

The process of creating VisionKG is depicted in the diagram above. It begins by gathering CV datasets and extracting annotation labels from them. 1. It follows the Linked Data principles and uses the RDF data model to create a unified data model for the annotation labels and visual features. Uniform Resource Identifiers (URIs) are used to name data entities (such as images, boxes, and labels). The RDF data model allows for the expression of data using triples of the form, <subject, predicate, object>.

To describe “an image contains a bounding box for a person” in the COCO dataset, we must first assign unique URIs for the image and the bounding box, e.g., and, to create the following triples: <img01, hasBox, box01>,<box01, hasObject, obj01>, <obj01, rdf: type, Person>.

Predefined predicates include hasBox, hasObject, and rdf:type, with rdf:type expressing that an object/image belongs to a specific class/type in the knowledge base, such as Person. We can also add metadata and semantic annotations to the images, such as where the images came from or the relationships between the boxes in an image (as shown in the 2nd step on the above figure).

Datasets and analysis, in particular, can be performed using rich semantic query languages such as SPARQL. The SPARQL query language allows users to describe queries using RDF statements that are similar to SQL statements. The first application is to obtain mixed-datasets in an elegant manner using VisionKG. 

For example, instead of the more complex query (as shown in below figure snippet 3) that covers all possible cases: a pedestrian in KITTI or man in Visual Genome, one can query for images of people from COCO, KITTI, and Visual Genome using a simple query (as shown in below figure snippet 4).

The diagram above depicts an example of label mapping in COCO, KITTI, and Visual Genome to knowledge base classes (depicted in 1 section). Labels are being expanded to reflect the class hierarchy (depicted in section 2). And two equivalent queries for retrieving images from the COCO, KITTI, and Visual Genome datasets that contain Person (depicted in section 3,4 section).

Practical example

In this section, we’ll take a look at some query samples of VisionKG by which we can extract data on the fly by using Python. 

Let’s quickly set up the environment and import dependencies.

from google.colab import output
# install our vision api
from google.colab import output
# install our vision api
!python -m pip install git+ --force
# import SemkgAPI
from vision_utils import semkg_api, data
from skimage import io
import matplotlib.pyplot as plt

Next, we can define a variable called query which holds the query statements. Here our query is about getting images of cars and trucks.

# Query string
query_='''#Give me 100 images containing car and truck
prefix cv:<>
    ?ann1 a cv:Annotation.
    ?ann1 cv:isAnnotationOfImage ?image.
    ?ann1 cv:hasAnnotatedObject ?obj1.
    ?obj1 cv:hasLabel "car".
    ?ann2 a cv:Annotation.
    ?ann2 cv:isAnnotationOfImage ?image.
    ?ann2 cv:hasAnnotatedObject ?obj2.
    ?obj2 cv:hasLabel "truck".
    ?image cv:hasLocalPath ?localPath.
LIMIT 100'''

Next, the above-defined query needs to be passed in the VisionKG’s API as below.

#query and return result

All the query outcomes are stored in the result variables, which is a dictionary of metadata of the queried image. Now we’ll plot some of the samples of queried images. 

#display sample images
f, ax_arr = plt.subplots(rows, cols, figsize=(16,8))
for j, row in enumerate(ax_arr):
    for i, ax in enumerate(row):
        if j*cols+i < len(result['images']):
            image = io.imread(semkg_api.SEMKG_IMAGES_HOST + result['images'][j*cols+i]['image_path'])
f.suptitle("Sample images from the query result", fontsize=16)

The result looks like this:

Final words

Through this article, we have seen a python-based framework that calls the computer vision-related data by using SPARQL which is a semantic query language used to retrieve and manipulate data as we have seen above. You can experiment with this language at Further with this data, we can perform tasks like object detection, image classification, and more. One can check the official repository for more examples.


Vijaysinh Lendave
Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox