Now Reading
Introduction To YolactEdge For Real-time Object Segmentation On Edge Device

Introduction To YolactEdge For Real-time Object Segmentation On Edge Device

Mohit Maithani

YolatEdge is one of the first competitive instanced segmentation techniques that can run on small devices with great real-time speed, It can reach up to 30fps on Nvidia Jetson AGX Xavier and 172fps on RTX 2080Ti. YolactEdge techniques come with Resnet-101 backbone which takes 550×550 resolution image as input. It paper called YolactEdge: Real-time Instance Segmentation on the Edge is authored by Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, and Yong Jae Lee in Dec 2020, and the code and models are open-sourced on GitHub here.

Some of the new features and things the authors came up with are:

  • TensorRT optimization technique without compromising trading off speed and accuracy, 
  • A novel feature warping module to accomplish temporal redundancy in videos.
  • Integrated YouTube VIS and MS COCO datasets.
  • Produces a 3 to 5x speedup over existing real-time methods while producing competitive mask and box detection accuracy.

In order to do inferences in real-time speeds on edge devices, the authors built the SOTA image-based real-time instances segmentation method YOLACT and did some new changes mainly two: one at algorithms level and other system levels. YolactEdge leverages the facility of Nvidia TensorRT machine inference engine to quantize the network parameters to fewer bits while systematically balancing any tradeoff inaccuracy, and it also leverages temporal redundancy in the video, and learn to rework and propagate features over time in order that the deep network’s expensive backbone feature computation doesn’t get to be fully computed on every frame.

YolactEdge Backbone

YOLACT can be divided into 4 components: 

  1. a feature Backbone.
  2. a feature pyramid network(FPN). 
  3. a ProtoNet.
  4. a Prediction Head.

As shown in the above figure, YolactEdge extends the YOLACT method to videos by transforming a set of the features from keyframes (shown in left) to nonkeyframes (shown on the right side of the above figure), to reduce expensive backbone computation. Especially on non-keyframes, it computes features that are cheap while crucial for mask prediction, which largely accelerates the technique while retaining accuracy on non-keyframes. YolacEdge uses blue, orange, and grey to indicate computed, transformed, and skipped blocks. 



YolactEdge is trained on a batch size of 32 on 4 GPUs using ImagNet already pre-trained weights, First, the authors used pre-train YOLACT with SGD for 500k iterations. Then, they froze YOLACT weights and trained FeatFlowNet on FlyingChairs dataset. Finally, they fine-tuned all weights except the ResNet backbone architecture for 200k iterations. 


  • It is written in python3 programming language 
  • Installed PyTorch 1.6.0 from here
  • Install CUDA 10.2/11.0 and cuDNN 8.0.0.
  • Download TensorRT 7.1 tar file here and install TensorRT from the official documentation.
  • Install torch2trt.
 git clone
 cd torch2trt
 sudo python install --plugins
 Installing some other dependencies:
 !pip install cython
 !pip install opencv-python pillow matplotlib
 !pip install !git+"egg=pycocotools&subdirectory=PythonAPI"
 !pip install GitPython termcolor tensorboard
 Clone the repo and change the directory inside:
 git clone
 cd yolact_edge 

YolactEdge Models

Authors provided baseline YOLACT and YolactEdge models trained on COCO and YouTube VIS dataset, given below is the information about Youtube VIS models

MethodBackbone mAPAGX-Xavier FPSRTX 2080 Ti FPSweights
YolactEdge(w/o TRT)R-50-FPN44.210.567.0download 
YolactEdge(w/o TRT)R-101-FPN46.99.561.2download 
Youtube VIS models

YolactEdge COCO Models

Method  Backbone    mAPTitan Xp FPSAGX-Xavier FPSRTX 2080 Ti FPSweights
COCO Models

To evaluate the pretrained models, you can put the corresponding weight file in the ./weights directory by creating one and run further commands.

See Also

Evaluation of YolactEdge

For Convert each component of the trained model to TensorRT using the optimal settings and evaluate on the YouTube VIS validation set.

 !python3 --trained_model=./weights/yolact_edge_vid_847_50000.pth
 # Evaluate on the entire COCO validation set.
 # '--yolact_transfer' is used to convert the models trained with YOLACT to be compatible with YolactEdge.
 !python3 --yolact_transfer --trained_model=./weights/yolact_edge_54_800000.pth
 # Output a COCO file for the COCO test-dev set. The command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively. These files can then be submitted to the website for evaluation.
 !python3 --yolact_transfer --trained_model=./weights/yolact_edge_54_800000.pth --dataset=coco2017_testdev_dataset --output_coco_json 

Running on Images

 # Display qualitative results on the specified image.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png
 # Process an image and save it to another file.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png
 # Process a whole folder of images.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder 

On videos

 # Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
 # If video_multiframe > 1, then the trt_batch_size should be increased to match it or surpass it. 
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=my_video.mp4
 # Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=0
 # Process a video and save it to another file. This is unoptimized.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4 


YolactEdge is the new way of looking at object detection problem with Real-time Instance Segmentation on the Edge with less computation power and the only thing we are left of is an optimization problem in deep learning projects which is been completed by approaches like YolacEdge, to  learn more about the project you can follow below resources:

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
You can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top