Network shows are broadcasted on their respective OTT platforms these days. This simply gives an additional benefit to the user for watching their favourite shows at any time in spite of missing out on the live broadcast. With providing the users access to this content, there comes a serious issue of copyrights to digital songs which are a part of many network TV shows. This is a very usual business use case for all broadcast leaders. For dealing with this issue, OTT players either mute the bollywood songs which are played as a part of the video or they replace the bollywood songs with self created music.
Now if you are wondering how this actually happens. One way to achieve this is by using Shazam. So we are not talking about the superhero Shazam but we are referring to the music application which is owned by Apple. So what does this application do? You can identify music, movies, advertising and televisions based on a short sample played using the microphone or an existing file on your device. Is it magic? This is the first thought that comes in your mind.
No! It works by taking acoustic samples of the music clips and creates a spectrogram out of it. This process is referred to as audio fingerprinting. A spectrogram is a time frequency graph where your music clip will be displayed. Shazam stores millions of such audio fingerprints in its database. We need to tag every fingerprint for the system to understand which song it is. The audio fingerprinting is generally done for a period of 10 seconds.
The next task is to recognize the captured sound and find a match based on the fingerprint amongst the millions of sounds in the database. If it is able to find a match, the song name, artist and album are returned back to the user. This application works in spite of some background noise. By using this application, some of the companies are working around it to solve their current problem. Shazam also has a proprietary version called Shazam Encore.
The major requirements for using Shazam for solving this problem in your organization is a finite library of songs which are used in the background in the network shows. So this is majorly a challenge of data availability. From earlier times, there was no maintenance of data which makes solving this problem a very difficult task. Even if there would be data availability it may be available in Excel formats. So solving this problem is a tough task given the data availability restrictions and haphazard storage of data.
If you think about solving this problem, then you would want to automate this task of audio fingerprinting and recognition.
Python has certain libraries for audio fingerprinting:
This is the famous audio fingerprinting and recognition implementation – Dejavu. So it’s open-source code which is available to the wider audience. This works on similar lines of fingerprinting as Shazam. It creates audio fingerprints for the music by memorizing the clips. Then by playing a song and recording microphone input, Dejavu attempts to match the audio against the fingerprints held in the database, returning the song being played.
The database used is MySql.The “fingerprints” are locality-sensitive hashes that are computed from the spectrogram of the audio. This is done by taking the FFT of the signal over overlapping windows of the song and identifying peaks. Efficiency and performance of the algorithm are very impressive as you get close to 100% accuracy by the sixth second of the recording.
Chromaprint and its associated Acoustid Web service make up a high-quality, open-source acoustic fingerprinting system. This package provides Python bindings for both the fingerprinting algorithm library, which is written in C but portable, and the Web service, which provides fingerprint lookups.
It is a similar implementation like Dejavu where audio fingerprints are created by evaluating the Fast Fourier Transform i.e. FFT Signals. Audiophile looks for the frequencies and time gaps between notes to match. If the original audio file is manipulated in some way (changed pitch or tempo) or is a different recording of the same song (like a live version), it may not match most of the time! This can be overcome by making it learn all the popular versions of a song. The only difference is that it takes around 10-15 seconds to recognize the song.
The Dejavu library actually comes from a GitHub implementation. One of the major shortcoming associated with this library is memory constraints while fingerprinting the whole song in the database. To overcome this issue, the SQLite database can be used instead of the MySQL database for storing the audio fingerprints. The major advantage provided by SQLite is minimum external data transition calls as SQLite has been implemented as a part of python itself.
This leads to a minimum out of memory errors and causes a smooth process both for fingerprinting and recognition. Using these implementations, one can perform audio fingerprinting and recognition and automate the entire process of muting the songs or replacing these Bollywood songs with self-created music by media houses.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
An engineer at the core, data science is my passion. I have a Masters in Data Science from NMIMS. I have worked on machine learning problems, image classification and reinforcement learning problems. Solving complex problems and thinking of easy solutions is what I practice. Avid reader and writer describe me the best.