Last updated February 5, 2021
In AI Mysteries

Guide to Panoptic Segmentation – A Semantic + Instance Segmentation Approach

Share

Published on February 5, 2021

by Nikita Shiledarbaxi

Panoptic segmentation is an image segmentation method used for Computer Vision tasks. It unifies two distinct concepts used to segment images namely, semantic segmentation and instance segmentation.

Panoptic segmentation technique was introduced by Kaiming He, Ross Girshick and Piotr Dollar of Facebook AI Research (FAIR), Carsten Rother of HCI/IWR, Heidelberg University (Germany) as well as Alexander Kirillov, a member of both the above mentioned organizations in April 2019 (version v3). Read the research paper here.

Let us first understand semantic segmentation and instance segmentation approaches in order to have clarity about panoptic segmentation.

A Computer Vision project aims at developing a deep learning model which can accurately and precisely detect real-world objects comprising the input data in the form of images or videos. Such models typically rely on the technique of image segmentation which delineates pixel-level boundaries for object detection.There are two effective yet fundamentally different approaches used for image segmentation which are as follows:

Semantic segmentation – It refers to the task of identifying different classes of objects in an image. It broadly classifies objects into semantic categories such as person, book, flower, car and so on.

Instance segmentation – It segments different instances of each semantic category and thus appears as an extension of semantic segmentation. For instance, if semantic segmentation method identifies a flock of birds in an image, then instance segmentation further eases the task of object detection by identifying individual birds in the flock (i.e. instances of the ‘bird’ semantic category).

This article talks about an advanced image segmentation approach called panoptic segmentation which is nothing but the combination of the above mentioned segmentation techniques. In other words, it semantically distinguishes different objects as well as identifies separate instances of each kind of object in the input image. It enables having a global view of image segmentation (category-wise as well as instance-wise), hence the name ‘PANOPTIC’ (means showing or seeing everything at once).

Before going into the details of panoptic segmentation, let us understand two important terminologies germane to image segmentation.

Things – Any countable object is referred to as a thing in Computer Vision projects. To exemplify – person, cat, car, key, ball are called things.

Stuff – Uncountable amorphous region of identical texture is known as stuff. For instance, road, water, sky etc.

Study of things falls under the category of instance segmentation task while study of stuff is a semantic segmentation task.

Panoptic segmentation assigns two labels to each of the pixels of an image – (i)semantic label (ii) instance id. The pixels having the same label are considered belonging to the same semantic class and instance id’s differentiate its instances. Unlike instance segmentation, each pixel in panoptic segmentation has a unique label corresponding to instance which means there are no overlapping instances.

Suppose, some naively encoded pixel values are as follows:

39001, 39002, 4, 5.

Here, (pixel_value//100) gives the semantic label while (pixel_value%100) gives the instance id. So for both 39001 and 39002, semantic label will be 39 i.e. both these pixels are assigned the same semantic class; but instance id for both are different (1 and 2 respectively) which shows that they are different instances of the class labelled 39. While pixels with values 4 and 5 belong to stuff classes.

Practical implementation of panoptic segmentation

The following code illustrates panoptic segmentation performed on MS-COCO dataset using PyTorch Python library and Detectron2 (a PyTorch-based modular library by Facebook AI Research (FAIR) for implementing object detection algorithms and also a rewrite of Detectron library). We have also used the DETR (DEtection TRansformer) framework introduced by FAIR which views object detection as a direct set prediction problem.

Not aware of Detectron, Detectron2 and DETR ? Refer to the following links before proceeding!

Import the required libraries

 from PIL import Image
 import requests
 import io
 import math
 import matplotlib.pyplot as plt
 %config InlineBackend.figure_format = 'retina'
 import torch
 from torch import nn
 from torchvision.models import resnet50
 import torchvision.transforms as T
 import numpy
 torch.set_grad_enabled(False);
 import itertools
 import seaborn as sns

Install the Panoptic API from GitHub for panoptic inference

! pip install git+https://github.com/cocodataset/panopticapi.git

Import the installed API

 import panopticapi
 from panopticapi.utils import id2rgb, rgb2id

List of COCO semantic classes:

 CLASSES = [
     'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
     'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A',
     'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
     'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A',   
     'Backpack', 'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 
     'frisbee', 'skis', ‘snowboard', 'sports ball', 'kite', 'baseball bat', 
     'baseball glove', 'skateboard', 'surfboard', 'tennis racket',    
     'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 
     'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 
     'hot dog', 'pizza', 'donut', 'cake', ‘chair', 'couch', 'potted  
     plant', 'bed', 'N/A', 'dining table', 'N/A', 'N/A', 'toilet', 'N/A', 
     'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 
     'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', 
     'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', ‘toothbrush']

Enumerate the above classes (Detectron2 model uses different numbering convention so we need to change it)

 coco2d2 = {}
 count = 0
 for i, c in enumerate(CLASSES):
   if c != "N/A":
     coco2d2[i] = count
     count+=1

Perform standard PyTorch mean-std input image normalization

 transform = T.Compose([
     T.Resize(800),
     T.ToTensor(),
     T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
 ])

Load a pre-trained model from torch hub and request the post-processor

 model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250)
 model.eval();

Retrieve an image from the validation set of COCO dataset for testing purpose

 url = "http://images.cocodataset.org/val2017/000000281759.jpg"
 im = Image.open(requests.get(url, stream=True).raw)

Mean-std normalize the input testing image (batch-size: 1)

 img = transform(im).unsqueeze(0)
 out = model(img)

Compute the probability score for each possible class, excluding the “no-object” class (the last one)

scores = out["pred_logits"].softmax(-1)[..., :-1].max(-1)[0]

Threshold the confidence to only masks with high confidence >0.85

keep = scores > 0.85

Plot the masks satisfying the confidence level condition

 ncols = 5
 fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10))
 for line in axs:
     for a in line:
         a.axis('off')
 for i, mask in enumerate(out["pred_masks"][keep]):
     ax = axs[i // ncols, i % ncols]
     ax.imshow(mask, cmap="cividis")
     ax.axis('off')
 fig.tight_layout()

Merge the individual predictions obtained by running the above lines of code into a unified panoptic segmentation. For that, we use DETR’s postprocessor.

The post-processor requires as input the target size of predictions (image size here)

result = postprocessor(out, torch.as_tensor(img.shape[-2:]).unsqueeze(0))[0]

Visualize the panoptic segmentation’s results

The segmentation is stored in a special-format png

 panoptic_seg = Image.open(io.BytesIO(result['png_string']))
 panoptic_seg = numpy.array(panoptic_seg, dtype=numpy.uint8).copy()

Retrieve the instance id corresponding to each mask

panoptic_seg_id = rgb2id(panoptic_seg)

Colour each mask individually and plot the visualization

 panoptic_seg[:, :, :] = 0
 for id in range(panoptic_seg_id.max() + 1):
   panoptic_seg[panoptic_seg_id == id] = numpy.asarray(next(palette)) * 255
 plt.figure(figsize=(15,15))
 plt.imshow(panoptic_seg)
 plt.axis('off')
 plt.show()

Output visualization will be as follows:

Use Detectron2’s plotting utilities to better visualize the above panoptic segmentation results.

Import the utilities

 !pip install detectron2==0.1.3 -f 
 https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html
 from detectron2.config import get_cfg
 from detectron2.utils.visualizer import Visualizer
 from detectron2.data import MetadataCatalog
 from google.colab.patches import cv2_imshow

Extract the segments information and the panoptic result from DETR’s prediction

 from copy import deepcopy
 segments_info = deepcopy(result["segments_info"])

Store the panoptic predictions in a special format png

 panoptic_seg = Image.open(io.BytesIO(result['png_string']))
 final_w, final_h = panoptic_seg.size

Convert the png into segment id map

 panoptic_seg = numpy.array(panoptic_seg, dtype=numpy.uint8)
 panoptic_seg = torch.from_numpy(rgb2id(panoptic_seg))

Change Detectron2’s numbering to appropriate class id’s

 meta = MetadataCatalog.get("coco_2017_val_panoptic_separated")
 for i in range(len(segments_info)):
     c = segments_info[i]["category_id"]
     segments_info[i]["category_id"] = meta.thing_dataset_id_to_contiguous_id[c] if segments_info[i]["isthing"] else meta.stuff_dataset_id_to_contiguous_id[c]

Visualize the improved prediction results

 v = Visualizer(numpy.array(im.copy().resize((final_w, final_h)))[:, :, ::-1], meta, scale=1.0)
 v._default_font_size = 20
 v = v.draw_panoptic_seg_predictions(panoptic_seg, segments_info, area_threshold=0)
 cv2_imshow(v.get_image())

Output visualization will be as follows: