Panoptic segmentation is an image segmentation method used for Computer Vision tasks. It unifies two distinct concepts used to segment images namely, semantic segmentation and instance segmentation.
Panoptic segmentation technique was introduced by Kaiming He, Ross Girshick and Piotr Dollar of Facebook AI Research (FAIR), Carsten Rother of HCI/IWR, Heidelberg University (Germany) as well as Alexander Kirillov, a member of both the above mentioned organizations in April 2019 (version v3). Read the research paper here.
Let us first understand semantic segmentation and instance segmentation approaches in order to have clarity about panoptic segmentation.
A Computer Vision project aims at developing a deep learning model which can accurately and precisely detect real-world objects comprising the input data in the form of images or videos. Such models typically rely on the technique of image segmentation which delineates pixel-level boundaries for object detection.There are two effective yet fundamentally different approaches used for image segmentation which are as follows:
- Semantic segmentation – It refers to the task of identifying different classes of objects in an image. It broadly classifies objects into semantic categories such as person, book, flower, car and so on.
- Instance segmentation – It segments different instances of each semantic category and thus appears as an extension of semantic segmentation. For instance, if semantic segmentation method identifies a flock of birds in an image, then instance segmentation further eases the task of object detection by identifying individual birds in the flock (i.e. instances of the ‘bird’ semantic category).
This article talks about an advanced image segmentation approach called panoptic segmentation which is nothing but the combination of the above mentioned segmentation techniques. In other words, it semantically distinguishes different objects as well as identifies separate instances of each kind of object in the input image. It enables having a global view of image segmentation (category-wise as well as instance-wise), hence the name ‘PANOPTIC’ (means showing or seeing everything at once).
Before going into the details of panoptic segmentation, let us understand two important terminologies germane to image segmentation.
- Things – Any countable object is referred to as a thing in Computer Vision projects. To exemplify – person, cat, car, key, ball are called things.
- Stuff – Uncountable amorphous region of identical texture is known as stuff. For instance, road, water, sky etc.
Study of things falls under the category of instance segmentation task while study of stuff is a semantic segmentation task.
Panoptic segmentation assigns two labels to each of the pixels of an image – (i)semantic label (ii) instance id. The pixels having the same label are considered belonging to the same semantic class and instance id’s differentiate its instances. Unlike instance segmentation, each pixel in panoptic segmentation has a unique label corresponding to instance which means there are no overlapping instances.
Suppose, some naively encoded pixel values are as follows:
39001, 39002, 4, 5.
Here, (pixel_value//100) gives the semantic label while (pixel_value%100) gives the instance id. So for both 39001 and 39002, semantic label will be 39 i.e. both these pixels are assigned the same semantic class; but instance id for both are different (1 and 2 respectively) which shows that they are different instances of the class labelled 39. While pixels with values 4 and 5 belong to stuff classes.
Practical implementation of panoptic segmentation
The following code illustrates panoptic segmentation performed on MS-COCO dataset using PyTorch Python library and Detectron2 (a PyTorch-based modular library by Facebook AI Research (FAIR) for implementing object detection algorithms and also a rewrite of Detectron library). We have also used the DETR (DEtection TRansformer) framework introduced by FAIR which views object detection as a direct set prediction problem.
Not aware of Detectron, Detectron2 and DETR ? Refer to the following links before proceeding!
Import the required libraries
from PIL import Image import requests import io import math import matplotlib.pyplot as plt %config InlineBackend.figure_format = 'retina' import torch from torch import nn from torchvision.models import resnet50 import torchvision.transforms as T import numpy torch.set_grad_enabled(False); import itertools import seaborn as sns
Install the Panoptic API from GitHub for panoptic inference
! pip install git+https://github.com/cocodataset/panopticapi.git
Import the installed API
import panopticapi from panopticapi.utils import id2rgb, rgb2id
List of COCO semantic classes:
CLASSES = [ 'N/A', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'N/A', 'Backpack', 'umbrella', 'N/A', 'N/A', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', ‘snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', ‘chair', 'couch', 'potted plant', 'bed', 'N/A', 'dining table', 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', ‘toothbrush']
Enumerate the above classes (Detectron2 model uses different numbering convention so we need to change it)
coco2d2 = {} count = 0 for i, c in enumerate(CLASSES): if c != "N/A": coco2d2[i] = count count+=1
Perform standard PyTorch mean-std input image normalization
transform = T.Compose([ T.Resize(800), T.ToTensor(), T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])
Load a pre-trained model from torch hub and request the post-processor
model, postprocessor = torch.hub.load('facebookresearch/detr', 'detr_resnet101_panoptic', pretrained=True, return_postprocessor=True, num_classes=250) model.eval();
Retrieve an image from the validation set of COCO dataset for testing purpose
url = "http://images.cocodataset.org/val2017/000000281759.jpg" im = Image.open(requests.get(url, stream=True).raw)
Mean-std normalize the input testing image (batch-size: 1)
img = transform(im).unsqueeze(0) out = model(img)
Compute the probability score for each possible class, excluding the “no-object” class (the last one)
scores = out["pred_logits"].softmax(-1)[..., :-1].max(-1)[0]
Threshold the confidence to only masks with high confidence >0.85
keep = scores > 0.85
Plot the masks satisfying the confidence level condition
ncols = 5 fig, axs = plt.subplots(ncols=ncols, nrows=math.ceil(keep.sum().item() / ncols), figsize=(18, 10)) for line in axs: for a in line: a.axis('off') for i, mask in enumerate(out["pred_masks"][keep]): ax = axs[i // ncols, i % ncols] ax.imshow(mask, cmap="cividis") ax.axis('off') fig.tight_layout()
Merge the individual predictions obtained by running the above lines of code into a unified panoptic segmentation. For that, we use DETR’s postprocessor.
The post-processor requires as input the target size of predictions (image size here)
result = postprocessor(out, torch.as_tensor(img.shape[-2:]).unsqueeze(0))[0]
Visualize the panoptic segmentation’s results
The segmentation is stored in a special-format png
panoptic_seg = Image.open(io.BytesIO(result['png_string'])) panoptic_seg = numpy.array(panoptic_seg, dtype=numpy.uint8).copy()
Retrieve the instance id corresponding to each mask
panoptic_seg_id = rgb2id(panoptic_seg)
Colour each mask individually and plot the visualization
panoptic_seg[:, :, :] = 0 for id in range(panoptic_seg_id.max() + 1): panoptic_seg[panoptic_seg_id == id] = numpy.asarray(next(palette)) * 255 plt.figure(figsize=(15,15)) plt.imshow(panoptic_seg) plt.axis('off') plt.show()
Output visualization will be as follows:
Use Detectron2’s plotting utilities to better visualize the above panoptic segmentation results.
Import the utilities
!pip install detectron2==0.1.3 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.5/index.html from detectron2.config import get_cfg from detectron2.utils.visualizer import Visualizer from detectron2.data import MetadataCatalog from google.colab.patches import cv2_imshow
Extract the segments information and the panoptic result from DETR’s prediction
from copy import deepcopy segments_info = deepcopy(result["segments_info"])
Store the panoptic predictions in a special format png
panoptic_seg = Image.open(io.BytesIO(result['png_string'])) final_w, final_h = panoptic_seg.size
Convert the png into segment id map
panoptic_seg = numpy.array(panoptic_seg, dtype=numpy.uint8) panoptic_seg = torch.from_numpy(rgb2id(panoptic_seg))
Change Detectron2’s numbering to appropriate class id’s
meta = MetadataCatalog.get("coco_2017_val_panoptic_separated") for i in range(len(segments_info)): c = segments_info[i]["category_id"] segments_info[i]["category_id"] = meta.thing_dataset_id_to_contiguous_id[c] if segments_info[i]["isthing"] else meta.stuff_dataset_id_to_contiguous_id[c]
Visualize the improved prediction results
v = Visualizer(numpy.array(im.copy().resize((final_w, final_h)))[:, :, ::-1], meta, scale=1.0) v._default_font_size = 20 v = v.draw_panoptic_seg_predictions(panoptic_seg, segments_info, area_threshold=0) cv2_imshow(v.get_image())
Output visualization will be as follows:
Google Colab notebook of the above implementation code can be found here.
- To get in-depth understanding of panoptic segmentation technique, read its research paper, web link of which is given below: