Active Hackathon

How YouTube’s ML Algorithm Earned Billions For Music Producers By Building Fingerprints For Songs

Before the rise of audio streaming services such as Gaana, Saavn and JioMusic, the premier destination for music was YouTube. A large number song from  Bollywood record labels were present on the platform for users’ listening pleasure. It was a model that worked for everyone, with the user not being charged for listening to the music and the copyright holders benefitting.

Companies such as T-Series rode this trend to its peak and is now going after the crown of YouTube’s most subscribed channel. Taking a step back and looking at the bigger picture, the question arises as to how one of the most commercialised and copyright-heavy channels is so accessible to the masses.


Sign up for your weekly dose of what's up in emerging technology.

In a world of rampant piracy and copyright infringement, YouTube exists in the intersection of paywalling and facilitating theft. As with every new frontier, the appointed watchdog and guardian of YouTube’s copyright-friendly atmosphere emerged. This is the system we know today as Content ID.

Why Copyright Was The First Frontier

The Internet was a free space in its infancy, and could very well be compared to the Wild West when it first emerged. As the net was opened to the masses, software pirates rushed on the scene and created an impression on record labels and producers. A message that the Internet was not a place that copyright could not be enforced, at least not in any sense of power.

Websites were protected against legal action by the safe harbour provisions of the Digital Millennium Copyright Act. This specifies that copyrighted content could exist on websites as long as there was a way to remove it.

Record labels and music producers kept away from putting music online for this very reason. Copyright infringement was the bane of their existence, sucking away at what little profits they were making as the Internet took over the world. Due to the free nature of the DMCA guidelines, a pirate could very easily make duplicates of a song posted on a platform by a copyright holder, leading to infringement and theft.

When Google acquired YouTube, this is the precise issue they wished to avoid. Examples such as Grooveshark and Napster stuck to the top honchos at Google, with a solution required quickly to avert the possibility of being sued by record labels. This prompted them to create a system that would automatically flag copyrighted content as early as 2007. They looked to make YouTube into a platform where companies could upload copyrighted content and get compensated for it without unauthorised pirates trying to capture market share.

What Is Content ID

Their solution was with Content ID. The system gave rights holders an automated way of finding unauthorised copies of videos and songs. Then, they were given the choice to block the content or run ads against it. Videos were checked at upload to detect copyright-protected material.

This can then be blocked by the copyright holder, muted in case of audio infringement, or restrict the video from playing on certain platforms. However, if the holder wishes to keep the content on the platform, they can choose to monetise it with ads and take the revenue from the views for themselves.

The system is treated as the first line of defence, and if the claim was made without basis, it can be refuted by the creator.  YouTube revealed the success of the venture, as the algorithm generated $2 billion in revenue for rights holders. It also represents a proactive method of handling problems, handling 98% of copyright management on YouTube.

How Does It Work

As mentioned previously, the Content ID systems scan videos at upload to compare them against a reference library. This library contains over 50 million copyrighted works, provided by holders, which add up to a combined watch time of more than 500 years. This content comes from thousands of hand-selected partners, leaving small chinks in the armour of the copyright system.

It uses audio and video fingerprinting technology to detect matches between videos people upload and the reference files. Videos see a frame-by-frame analysis for fingerprinting, with identical images tackled by the use of a heat map visualisation that compares frame data from two videos side by side.

The system utilises a finite-state transducer algorithm for fingerprinting music. This allows it to detect changes such as beeps in the middle of songs, pitch, volume and speed changes, along with audio overlays and effects.

To power this large amount of compute, YouTube harnesses the power of Google’s Brain deep learning system. Content ID directly functions on this platform, providing multiple advantages. The deep learning framework also allows developers to make any changes that cause Content ID to fail, such as flipping video or changing aspect ratios.

The deep learning smarts of Google’s system allows for a much more organic and easy way to launch new iterations of the fingerprinting system. The neural network can be trained easily and much faster.

The Future Of Automated Copyrighting

Even as the music industry is largely moving away from a focus on copyright to more pressing matters such as artists receiving remuneration, YouTube’s move is one of the most important in the space. It not only ensured the future of the platform as we know it today but also set the standard for other platforms such as Twitter to engage in copyright protection for content on their platform. Content ID thus served its purpose as a stopgap into a world of more complex issues in the music industry while ensuring a fair outcome for everyone.

More Great AIM Stories

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM