Listen to this story
Facebook’s parent company Meta has released its AI model called SAM—Segment Anything model. The model aims to segment parts of an image, and detect objects that it has never seen before.
The model is able to do this because of the dataset “SA-1B”. Meta claims this is the most extensive dataset of its kind to date and includes 1.1 billion segmentation masks produced by its segmentation model and 11 million images.
Image segmentation is an integral part of computer vision technologies and algorithms. Big Techs like Google and Amazon have also been working on computer vision for a while now.
Meta VS Google VS Amazon
Google released high-performance TPU implementations of two state-of-the-art segmentation models, Mask R-CNN — for instance segmentation — and DeepLab v3+ —for semantic segmentation — as open source code in 2019.
Amazon attempted to learn to segment images without manually segmented training data. They developed Box2Seg, which is an instance segmentation model that predicts object masks and bounding boxes in a single step.
The model uses a combination of region proposal networks (RPNs) and convolutional neural networks (CNNs) to detect and segment objects in images. The RPNs propose object regions in the image, and the CNNs refine the proposals and predict the segmentation masks.
All four models have demonstrated impressive results on benchmark datasets, and they have their own strengths and weaknesses. Box2Seg and Mask R-CNN are particularly useful when precise object localization is required, while SAM and DeepLab v3+ are more flexible and can be used for a wider range of segmentation tasks.
In summary, while Amazon, Meta, and Google are all conducting research in the area of segmentation in computer vision, they differ in their specific research areas and methodologies. Amazon has developed instance segmentation and semantic segmentation models, Meta has developed general-purpose object segmentation models, and Google has developed a range of segmentation techniques for semantic, instance, and panoptic segmentation.
One SAM, Many Uses Cases
Meta in its blog said that SAM is a generalized segmentation model—a mixture of two classic approaches to segmentation—interactive and automatic segmentation.
And the company claimed that SAM can perform interactive segmentation and automatic segmentation with a flexible prompt, enabling a wide range of segmentation tasks.
One of the possible use cases underlined by Meta is SAM’s use in the AR/VR domain, where it could enable selecting an object based on a user’s gaze and then “lifting” it into 3D.
It could be useful in any field that requires finding and segmenting objects, for instance cell microscopy — without requiring additional training. It could also be useful for scientific studies & content creation.
SAM has learned a general notion of what objects are, and it can generate masks for any object in any image or any video, even including objects and image types that it had not encountered during training.
This announcement also comes as a sign that Meta was not going to let competitors pass it by in the AI race.
While Meta has struck a cord when it comes to significant amounts of research into artificial intelligence and certain breakthroughs it has been struggling to integrate them into their products like Instagram and facebook. The company has moved away from Metaverse to direct their complete focus on generative AI.
Consequently, Zuckerberg announced a new product group at Meta that is working on A.I. products for Instagram, WhatsApp.
According to a Facebook post by CEO Mark Zuckerberg, multiple teams within Meta will be merged to form a new unit, which will be headed by the current Chief Product Officer, Chris Cox.
The objective of this unit is to develop innovative and communicative tools to be utilized in Meta’s products.