MITB Banner

How to query data for Computer Vision tasks using VisionKG?

In this article, we will discuss the VisionKG in detail and will see how it can query the dataset like COCO and ImageNet.

We mostly know SQL as a query language that facilitates us to query relational data from almost any database, but when it comes to gathering data for computer vision-related tasks, we have to depend on a distinct host of those data sets such as ImageNet, COCO, etc. Recently researchers have proposed a framework called VisionKG, which can integrate those datasets seamlessly. So in this article, we will discuss the VisionKG in detail and will see how it can query the dataset like COCO and ImageNet. Below are the major points to be discussed in this article.

Table of contents

  1. The general idea of VisionKG
  2. How it queries the data?
  3. Practical example

Let’s first discuss what this framework brings out. 

The general idea of VisionKG

It is a unified dataset framework in data-centric AI. Not only are diverse datasets from one AI domain integrated and linked together within this framework, but so are datasets from multiple AI areas. 

Existing resources, such as ConceptNet and Wikidata, have a similar purpose in that they integrate data from various sources and make it public, but they instead focus on certain application domains, and none of them is linked to databases in other areas, such as computer vision. Scene graphs, on the other hand, were introduced in the computer vision fields to model the relationship between identified items in photos.

They lack cross-domain compatibility, however, and cannot be queried using common query languages. As a result, current resources should be streamlined, and new resources should be easily incorporated. It is quite advantageous, for example, it can aid in the prevention of distribution shifts and the development of more robust models for training and testing. 

Furthermore, the transition from model tinkering to a deep knowledge of data necessitates that datasets be better organized. Furthermore, the term “data” should be expanded by this approach to include not only training data but also abstract information, such as commonsense or causal relationships.

In a nutshell, it is a single framework for various datasets that allows them to be readily merged and queried, for example, using standard query languages.

How it queries the data?

To realize the concepts stated above, researchers created VisionKG, a unified knowledge graph for CV datasets (e.g, COCO). VisionKG is a knowledge graph based on the Resource Description Framework (RDF) that contains RDF statements describing the metadata of pictures and the semantics of their annotations. 

The World Wide Web Consortium (W3C) recommends RDF as a standardized data model for semantic data integration and as a formal representation for shared human-machine understanding. As a result, RDF may be used to represent numerous semantic structures of prominent label taxonomies like Wordnet, ConceptNet, and Freebase, which are utilized in a variety of CV datasets including Imagenet and OpenImage.

The process of creating VisionKG is depicted in the diagram above. It begins by gathering CV datasets and extracting annotation labels from them. 1. It follows the Linked Data principles and uses the RDF data model to create a unified data model for the annotation labels and visual features. Uniform Resource Identifiers (URIs) are used to name data entities (such as images, boxes, and labels). The RDF data model allows for the expression of data using triples of the form, <subject, predicate, object>.

To describe “an image contains a bounding box for a person” in the COCO dataset, we must first assign unique URIs for the image and the bounding box, e.g., vision.semkg.org/img01 and vision.semkg.org/box01, to create the following triples: <img01, hasBox, box01>,<box01, hasObject, obj01>, <obj01, rdf: type, Person>.

Predefined predicates include hasBox, hasObject, and rdf:type, with rdf:type expressing that an object/image belongs to a specific class/type in the knowledge base, such as Person. We can also add metadata and semantic annotations to the images, such as where the images came from or the relationships between the boxes in an image (as shown in the 2nd step on the above figure).

Datasets and analysis, in particular, can be performed using rich semantic query languages such as SPARQL. The SPARQL query language allows users to describe queries using RDF statements that are similar to SQL statements. The first application is to obtain mixed-datasets in an elegant manner using VisionKG. 

For example, instead of the more complex query (as shown in below figure snippet 3) that covers all possible cases: a pedestrian in KITTI or man in Visual Genome, one can query for images of people from COCO, KITTI, and Visual Genome using a simple query (as shown in below figure snippet 4).

The diagram above depicts an example of label mapping in COCO, KITTI, and Visual Genome to knowledge base classes (depicted in 1 section). Labels are being expanded to reflect the class hierarchy (depicted in section 2). And two equivalent queries for retrieving images from the COCO, KITTI, and Visual Genome datasets that contain Person (depicted in section 3,4 section).

Practical example

In this section, we’ll take a look at some query samples of VisionKG by which we can extract data on the fly by using Python. 

Let’s quickly set up the environment and import dependencies.

from google.colab import output
# install our vision api
from google.colab import output
# install our vision api
!python -m pip install git+https://github.com/cqels/vision.git --force
output.clear()
 
# import SemkgAPI
from vision_utils import semkg_api, data
from skimage import io
import matplotlib.pyplot as plt

Next, we can define a variable called query which holds the query statements. Here our query is about getting images of cars and trucks.

# Query string
query_='''#Give me 100 images containing car and truck
prefix cv:<http://vision.semkg.org/onto/v0.1/>
SELECT DISTINCT ?image
WHERE {
    ?ann1 a cv:Annotation.
    ?ann1 cv:isAnnotationOfImage ?image.
    ?ann1 cv:hasAnnotatedObject ?obj1.
    ?obj1 cv:hasLabel "car".
    ?ann2 a cv:Annotation.
    ?ann2 cv:isAnnotationOfImage ?image.
    ?ann2 cv:hasAnnotatedObject ?obj2.
    ?obj2 cv:hasLabel "truck".
    ?image cv:hasLocalPath ?localPath.
}
LIMIT 100'''

Next, the above-defined query needs to be passed in the VisionKG’s API as below.

#query and return result
result=semkg_api.query(query_)

All the query outcomes are stored in the result variables, which is a dictionary of metadata of the queried image. Now we’ll plot some of the samples of queried images. 

#display sample images
rows=3
cols=4
f, ax_arr = plt.subplots(rows, cols, figsize=(16,8))
for j, row in enumerate(ax_arr):
    for i, ax in enumerate(row):
        if j*cols+i < len(result['images']):
            image = io.imread(semkg_api.SEMKG_IMAGES_HOST + result['images'][j*cols+i]['image_path'])
            ax.imshow(image)
            ax.axis('off')
 
f.suptitle("Sample images from the query result", fontsize=16)
plt.show()

The result looks like this:

Final words

Through this article, we have seen a python-based framework that calls the computer vision-related data by using SPARQL which is a semantic query language used to retrieve and manipulate data as we have seen above. You can experiment with this language at https://vision.semkg.org/. Further with this data, we can perform tasks like object detection, image classification, and more. One can check the official repository for more examples.

References    

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Vijaysinh Lendave

Vijaysinh Lendave

Vijaysinh is an enthusiast in machine learning and deep learning. He is skilled in ML algorithms, data manipulation, handling and visualization, model building.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories