Now Reading
How Is Voot Using Computer Vision To Scale-up Their Offerings

How Is Voot Using Computer Vision To Scale-up Their Offerings

Srishti Deoras
W3Schools

One of the interesting areas where computer vision is being explored is the video streaming platform. Video streaming has become one of the most important fields in the last decade, especially with the lockdown that has resulted in an increased viewership on these platforms than ever before. 

To boost the user experience a lot of these companies are using computer vision (CV). Addressing the attendees at CVDC 2020 Anubhav Shrivastava, head of Data Sciences at Voot Viacom18 shared some of the ways that Voot is using computer vision. 

Computer Vision Has Gained Traction Over The Years

Shrivastava shared that the area of computer vision has been evolving over the years and has drastically seen an increase in applications with time. He defined CV as the field of study of developing techniques to help machines see, interpret and create visuals. 



Some of the common application areas of computer vision include surveillance, biometrics, heat maps, checkouts, facial recognition, medical imaging, remote sensing, 3D model building, intelligent image processing and more. Some of the methods in computer vision that are most popularly used are object detection, image classification, semantic segmentation, instance segmentation, deep tracking, GANs and more. 

Computer Vision At Voot

Voot is an ad-supported and subscription video streaming platform available in India. It has more than 100 million peak monthly users. For operating at such a large scale, Voot largely depends on emerging technologies such as artificial intelligence and machine learning, of which computer vision is a crucial technology. 

Shrivastava shared that computer vision plays a crucial role in video streaming as it helps in increasing user experience, increasing user retention, increase in efficiency by bringing about automation, reduce cost, increase monetising opportunities, increase user acquisition and more. ‘It has been used both at the supply-side to create content and at the consumption-side to increase user experience,” he said. 

Some of the areas where Voot is using computer vision are: 

Ads cue point detector: The content on video is placed with ads in between which plays a key role in making or breaking viewership. The ads have to be strategically placed as its location at the wrong place at the wrong time may lead to the audience to leave. Placing ads in continuation with the content is crucial. Using CV has helped Voot to decide where to place ads, ensure that video watching experience is not hampered, replace manual detection, omit human error, handle scale and enable a pathway for experimentation. 

Contextual ads: Another important aspect is to place ads that are relevant to the user depending on the in-video objects. Using technologies such as object detection, facial recognition, colour recognition and more, Voot is playing on human psychology to mimic human behaviour. They have been able to drastically increase user engagement, propensity to buy and ensure higher monies. 

See Also

Product placements: Product placement is a popular concept in Hollywood and is now used in videos all across the globe. Most of the time, the content is created first and then they look for buyers or the content is made for TV which is then moved online. In situations like these, it may be difficult to ensure ad placements. Therefore, product placements come into picture where product logos are strategically placed into the videos. Computer vision comes handy to detect whitespaces, deep tracking logo insertion, ensure post publishing advertising, and more. 

Highlight creation: Highlights or short video content is in demand as most users wish to see the important highlights from episodes in a go. While creating highlights in sports is easy as it allows for voice detection during important moments in the form of audience cheering or high pitched commentary, it is challenging to do this in non-sports videos. Shrivastava says that it is not only about decibels but requires to look into the defining moments, listen to the video intently, observe facial expressions and more. Voot creates a lot of highlights and is experimenting with computer vision to produce and scale it. It is still not in production yet but the experiment is on. 

Augmented reality: Voot is also looking to include augmented reality into its shows to increase user engagement. Especially in the times of COVID, it is ensuring that users are a part of the show while at home. This too is at an experimental phase and Shrivastava believes that it will be a reality in 2-5 years. 

Shrivastava concluded by sharing that the key areas where computer vision has been deployed are to increase user engagement, user experience and increase monetisation. It has been using techniques such as group normalisation and video to video synthesis to create thematic content catering to audiences in a specific manner. 

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top