Guide to SOLO and SOLOv2: Ways To Implement Instance Segmentation By Location



SOLO (segment objects by locations) is a simple and flexible framework applied for accomplishing instance segmentation in digital image processing and computer vision tasks. It is based on the notion of “instance categories” for instance segmentation in which each pixel within an instance of an object is assigned a category based on its location and size.

SOLO was introduced by Xinlong Wang and Chunhua Shen at The University of Adelaide, Australia along with Tao Kong, Yuning Jiang 2 and Lei Li of the ByteDance AI Lab in July 2020 (revised version v3). Click here to read the research paper.

Before moving on to the details of the SOLO approach, let us first understand the concept of instance segmentation.

Instance segmentation is different from semantic segmentation method. Semantic segmentation refers to the process of associating every pixel of an image with a class label such as a person, flower, car and so on. It treats multiple objects of the same class as a single entity. On the contrary, instance segmentation treats multiple objects of the same class as distinct individual instances. 

semantic and instance segmentation
Semantic segmentation Instance segmentation

 As shown in the above figure, semantic segmentation identifies all the entities belonging to the ‘person’ category whereas instance segmentation identifies each individual within the category as a different entity (person1, person2 etc.).

All the objects comprising an image belong to a fixed set of semantic categories so semantic segmentation can be easily formulated as a dense per-pixel classification problem. However,  the number of instances in each semantic category may vary and hence, it is a challenging task to predict instance labels directly following the same paradigm as semantic segmentation.

Central Idea of SOLO

SOLO reforms the task of instance segmentation into two simultaneously performed classification tasks. It first divides an input image into uniform grids. If the centre of an object lies in a grid cell, that cell has to perform the following two tasks:

  1. Predict the semantic category
  2. The segment that object’s instance(s)

SOLO differentiates object instances in an image based on ‘instance categories’ i.e., the quantized centre locations and object sizes. The concept of instance categories allows us to Segment Objects by LOcations and hence the name, SOLO.

Recent instance segmentation methods follow one of the two approaches: 

  1. Top-down approach: Detect bounding boxes around the object(s) and then segment the instance mask in each bounding box to distinguish separate instances of the object (called ‘detect-then-segment’ approach)
  1. Bottom-up approach: Pull close pixels of the same instances and push away the pixels of different instances, thus creating an affinity relationship between them and assigning an embedding vector to each pixel. Then group similar pixels to delineate instances.

The above paradigms are step-wise and more importantly ‘indirect’ i.e. they rely on precise detection of bounding boxes or embedding learning followed by pixels’ grouping process. On the contrary, SOLO is a ‘direct’ approach relying on full mask annotations.

Practical implementation of SOLO


The installation process is based on MMDetection (v1.0.0) (MMDetection is an open-source toolbox based on PyTorch and used for object detection).


 Script for setting up SOLO with conda and linking the dataset path is as follows:

Create conda virtual environment and activate it

 conda create -n solo python=3.7 -y
 conda activate solo 

Install PyTorch and torchvision

conda install -c pytorch pytorch torchvision -y

Install the compiled programming language – Cython

conda install cython -y

Clone the SOLO GitHub repository

 git clone
 cd SOLO 

Install build requirements followed by SOLO

 pip install -r requirements/build.txt
 pip install "git+"
 pip install -v -e. 

Link the dataset path

 mkdir data
 ln -s DATASET_PATH data 

Once the installation is successfully completed, download the required model (list of provided models is available here) and then run the .py code file coded as follows:

from mmdet.apis import init_detector, inference_detector, show_result_pyplot, show_result_ins

Define the config file path e.g.

config = ‘../config/solo/’

Download checkpoint from the model zoo and place it in ‘checkpoints/’

checkpt = '../checkpoints/DECOUPLED_SOLO_R50_3x.pth'

Build the model from config file and checkpoint file

model = init_detector(config, checkpt, device='cuda:0')

Test a single image, say abc.jpg

 test_image = ‘abc.jpg’
 test_result = inference_detector’(model, test_image)
 show_result_ins(test_image, test_result, model.CLASSES, score_thr=0.25,   

Train with a single GPU

python tools/ ${CONFIG_FILE}

Train with multiple GPUs

./tools/ ${CONFIG_FILE} ${GPU_NUM}

Test with a single GPU

python tools/ ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --out  ${OUTPUT_FILE} --eval segm

Test with multiple GPUs 

./tools/ ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  --show --out  ${OUTPUT_FILE} --eval segm

Visualize the results

python tools/ ${CONFIG_FILE} ${CHECKPOINT_FILE} --show --save_dir  ${SAVE_DIRECTORY}

Source: GitHub repository

Pros of SOLO

  • Totally box-free: SOLO is not restricted by (anchor) box locations and scales and hence benefit from the inherent advantages of Fully Connected Networks (FCNs).
  • Direct instance segmentation: SOLO takes an image as input, directly outputs instance masks and the corresponding semantic class probabilities in a fully convolutional, box-free and grouping-free paradigms

Limitations of SOLO

  • inefficient mask representation and embedding learning
  • not high enough resolution for finer mask predictions
  • slow mask Non-Maximum Suppression (NMS)

The above-mentioned bottlenecks of SOLO are eliminated by a dynamic and faster framework called SOLOv2.

Overview of SOLOv2

SOLOv2 was proposed by Xinlong Wang and Chunhua Shen of The University of Adelaide (Australia), Rufeng Zhang of Tongji University (China) as well as Tao Kong and Lei Li of ByteDance AI Lab in October, 2020. Refer to the SOLOv2 research paper to understand its working and underlying concepts.

SOLOv2 is a dynamic scheme for segmenting objects by locations. It divides the mask learning process into two parts: convolution kernel learning and feature learning. It predicts mask kernels dynamically according to the input while assigning appropriate location categories to different pixels. It then constructs a unified and high-resolution mask feature representation for instance-aware segmentation. The process of learning the mask kernels and mask features can be carried out separately with efficiency. To prevent duplicate predictions, it employs a matrix NMS algorithm

Practical implementation of SOLOv2

Click here and follow the installation steps before proceeding further!

Run the following command lines:

 wget -O SOLOv2_R50_3x.pth
 python demo/ \
     --config-file configs/SOLOv2/R50_3x.yaml \
     --input input1.jpg input2.jpg \
     --opts MODEL.WEIGHTS SOLOv2_R50_3x.pth 

Set up the required dataset e.g. MS-COCO

Train the model on COCO

 OMP_NUM_THREADS=1 python tools/ \
     --config-file configs/SOLOv2/R50_3x.yaml \
     --num-gpus 8 \
     OUTPUT_DIR training_dir/SOLOv2_R50_3x 

Evaluate the model on COCO

 OMP_NUM_THREADS=1 python tools/ \
     --config-file configs/SOLOv2/R50_3x.yaml \
     --eval-only \
     --num-gpus 8 \
     OUTPUT_DIR training_dir/SOLOv2_R50_3x \
     MODEL.WEIGHTS training_dir/SOLOv2_R50_3x/model_final.pth 

Source: GitHub repository

Refer to the following links to have a deep understanding of SOLO and SOLOv2:

Download our Mobile App

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox