How RetinaNet Fixes The Shortcomings Of SSD With Focal Loss

In the conventional object detectors, say, R-CNN, initially a set of object locations are generated and then these locations are classified whether they belong to the foreground or background classes using a CNN. This is working of a two-stage detector. In the case of one stage detectors like SSD, the accuracy is more when applied over dense sampling of object locations, scales and aspect ratio.

One-stage detectors generate a large set of object locations that densely cover few areas of the image. This creates a class imbalance as the negatives are increased and the object classes present in those locations go undetected.

RetinaNet was introduced by Facebook AI Research to tackle the dense detection problem.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Under The Hood Of RetinaNet

RetinaNet was introduced to fill in for the imbalances and inconsistencies of the single shot object detectors like YOLO and SSD while dealing with extreme foreground-background classes.

RetinaNet is designed to accommodate Focal Loss, a method to prevent negatives from clouding the detector.

RetinaNet Network Architecture

The classification subnet predicts the probability of an object being present in a particular location.

The subnet is a kind of smaller version of fully convolutional networks(FCN) attached to each feature pyramid network(FPN) level.

An input feature map is taken from a given pyramid level and four 3 x 3 convolutional layers, followed by ReLU activations, and then by 3 x 3 convolutional layer.

Along with the classification subnet, a box regression subnet is attached to nullify the offset from each box to a nearby main object.

Negatives or background objects location are classified as a vector containing only zeros whereas, positives or foreground are classified by a one-hot vector. Assuming the prediction is a vector of all zeros but the target was a one-hot vector (in other words, a false negative), then the focal loss will evaluate to a large value for that anchor box.

Enhancement With Focal Loss

The loss function used in this approach is the loss of the output of classification subnet. This loss is applied to all the anchors in each sampled image.

Total focal loss of an image is the sum of the focal loss over all the anchors. The normalisation is done on the anchors assigned and not on the total anchors to avoid the negatives generated by overall anchors.

RetinaNet enabled by focal loss performs better than all existing methods, discounting the low-accuracy trend.

Initialization of RetinaNet needs a probability threshold(~0.01) for the anchor boxes. This probability is fed into the last convolutional layer of the classification subnet. This prior probability value indicates the ratio of foreground to background objects i.e positives to negatives. Hence this value is very significant.

This enhancement of using the focal loss in RetinaNet brings down the overall negatives in the output. The background is now more clearly distinguished from the foreground objects.

RetinaNet effectively improved a lot upon single-shot detection with its new training approach. Currently, there are few variants of RetinaNet, where the researchers introduce an adaptive loss function along with an instance mask prediction during training.

Read more about RetinaNet here.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR