Introduction To YolactEdge For Real-time Object Segmentation On Edge Device

YolatEdge is one of the first competitive instanced segmentation techniques that can run on small devices with great real-time speed, It can reach up to 30fps on Nvidia Jetson AGX Xavier and 172fps on RTX 2080Ti. YolactEdge techniques come with Resnet-101 backbone which takes 550×550 resolution image as input. It paper called YolactEdge: Real-time Instance Segmentation on the Edge is authored by Haotian Liu, Rafael A. Rivera Soto, Fanyi Xiao, and Yong Jae Lee in Dec 2020, and the code and models are open-sourced on GitHub here.

Some of the new features and things the authors came up with are:

  • TensorRT optimization technique without compromising trading off speed and accuracy, 
  • A novel feature warping module to accomplish temporal redundancy in videos.
  • Integrated YouTube VIS and MS COCO datasets.
  • Produces a 3 to 5x speedup over existing real-time methods while producing competitive mask and box detection accuracy.

In order to do inferences in real-time speeds on edge devices, the authors built the SOTA image-based real-time instances segmentation method YOLACT and did some new changes mainly two: one at algorithms level and other system levels. YolactEdge leverages the facility of Nvidia TensorRT machine inference engine to quantize the network parameters to fewer bits while systematically balancing any tradeoff inaccuracy, and it also leverages temporal redundancy in the video, and learn to rework and propagate features over time in order that the deep network’s expensive backbone feature computation doesn’t get to be fully computed on every frame.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

YolactEdge Backbone

YOLACT can be divided into 4 components: 

  1. a feature Backbone.
  2. a feature pyramid network(FPN). 
  3. a ProtoNet.
  4. a Prediction Head.

As shown in the above figure, YolactEdge extends the YOLACT method to videos by transforming a set of the features from keyframes (shown in left) to nonkeyframes (shown on the right side of the above figure), to reduce expensive backbone computation. Especially on non-keyframes, it computes features that are cheap while crucial for mask prediction, which largely accelerates the technique while retaining accuracy on non-keyframes. YolacEdge uses blue, orange, and grey to indicate computed, transformed, and skipped blocks. 



YolactEdge is trained on a batch size of 32 on 4 GPUs using ImagNet already pre-trained weights, First, the authors used pre-train YOLACT with SGD for 500k iterations. Then, they froze YOLACT weights and trained FeatFlowNet on FlyingChairs dataset. Finally, they fine-tuned all weights except the ResNet backbone architecture for 200k iterations. 


  • It is written in python3 programming language 
  • Installed PyTorch 1.6.0 from here
  • Install CUDA 10.2/11.0 and cuDNN 8.0.0.
  • Download TensorRT 7.1 tar file here and install TensorRT from the official documentation.
  • Install torch2trt.
 git clone
 cd torch2trt
 sudo python install --plugins
 Installing some other dependencies:
 !pip install cython
 !pip install opencv-python pillow matplotlib
 !pip install !git+"egg=pycocotools&subdirectory=PythonAPI"
 !pip install GitPython termcolor tensorboard
 Clone the repo and change the directory inside:
 git clone
 cd yolact_edge 

YolactEdge Models

Authors provided baseline YOLACT and YolactEdge models trained on COCO and YouTube VIS dataset, given below is the information about Youtube VIS models

MethodBackbone mAPAGX-Xavier FPSRTX 2080 Ti FPSweights
YolactEdge(w/o TRT)R-50-FPN44.210.567.0download 
YolactEdge(w/o TRT)R-101-FPN46.99.561.2download 
Youtube VIS models

YolactEdge COCO Models

Method  Backbone    mAPTitan Xp FPSAGX-Xavier FPSRTX 2080 Ti FPSweights
COCO Models

To evaluate the pretrained models, you can put the corresponding weight file in the ./weights directory by creating one and run further commands.

Evaluation of YolactEdge

For Convert each component of the trained model to TensorRT using the optimal settings and evaluate on the YouTube VIS validation set.

 !python3 --trained_model=./weights/yolact_edge_vid_847_50000.pth
 # Evaluate on the entire COCO validation set.
 # '--yolact_transfer' is used to convert the models trained with YOLACT to be compatible with YolactEdge.
 !python3 --yolact_transfer --trained_model=./weights/yolact_edge_54_800000.pth
 # Output a COCO file for the COCO test-dev set. The command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively. These files can then be submitted to the website for evaluation.
 !python3 --yolact_transfer --trained_model=./weights/yolact_edge_54_800000.pth --dataset=coco2017_testdev_dataset --output_coco_json 

Running on Images

 # Display qualitative results on the specified image.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png
 # Process an image and save it to another file.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png
 # Process a whole folder of images.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder 

On videos

 # Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
 # If video_multiframe > 1, then the trt_batch_size should be increased to match it or surpass it. 
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=my_video.mp4
 # Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=0
 # Process a video and save it to another file. This is unoptimized.
 python --yolact_transfer --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4 


YolactEdge is the new way of looking at object detection problem with Real-time Instance Segmentation on the Edge with less computation power and the only thing we are left of is an optimization problem in deep learning projects which is been completed by approaches like YolacEdge, to  learn more about the project you can follow below resources:

Mohit Maithani
Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox