MITB Banner

Grammy For AI? How Deepsing Makes Music Videos

Share

In an episode called ‘The Devil’s Hands Are Idle Playthings’ from the TV series Futurama, the show’s protagonist Fry plays a musical instrument called Holophonor. This musical instrument is a clarinet that looks like an instrument that has a holographic lens which can display images based on the mood of the music. The show is set in the 30th century, and it is totally understandable if one is not able to comprehend the technology behind this. 

However, Greek researchers Nikolaos Passalis and Stavros Doropoulos have tried to replicate something similar using machine learning algorithms. In their work titled Deepsing, they have demonstrated the idea behind this quirky application.

Deepsing Overview

via Furtuama Holophoner

Taking inspiration from Futurama Holophoner, Deepsing is designed to translate audio to images. It works by performing attributed-based music-to-image translation and synthesizes visual stories according to the sentiment expressed by songs. 

The sentiment-aware generated images aim to induce the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings.

This is how Deepsing comes up with visuals:

  • Firstly, it classifies music segments based on valence and arousal
  • Then the audio and the associated sentiments are mapped to image categories
  • The sentiment in the images is then enhanced using neural style transfer.
  • Then the GANs come up with out of the box visual stories.

In the above picture, for example, note the generated “Feather Boa” during the most arousing riff of the song and the transition to a “prison” as the valence of the song decreases. Sample frames were generated using the song “Chop Suey!” by “System Of A Down”. 

Keyframes selected along with annotations regarding the corresponding affective content of the song. The generated images were then aimed at inducing the same feelings to the viewers, as the original song does, reinforcing the primary aim of music, i.e., communicating feelings. 

The process of music-to-image translation poses unique challenges, mainly due to the unstable mapping between the different modalities involved in this process. 

In this paper, the authors have employed a trainable cross-modal translation method to overcome this limitation, leading to the first, and one-of-its-kind deep learning method for generating sentiment-aware visual stories. 

How DeepSing Works

When the model is asked to paint a picture of a song, supposedly — “Take Me To Church”, based on the highs and lows in the audio, an estimate of the emotional variance between positive and negative is assessed and some pictures are displayed. In the below case, a palace has been shown which exudes positivity.

There is an option on the website that allows the user to play with different moods for the same lyrics. So, here’s how a negativity-inducing palace looks like:

The authors claimed to believe that the whole output can be improved by selecting the class to use for the content generation according to a semantic similarity measure with the rest of the selected classes, instead of the cardinality-based sampling. 

They have assured that using audio and video estimators to align the feature spaces and aligning the attribute spaces can enrich the generated content, and produce diverse visual stories.

Future Direction

Music video production has imbibed a strange custom of using patterns and other hallucinating animations to compliment the audio. Videos, nowadays are abstract and have become art itself. Music videos are one of the only few human-made creations that are weirder than those done by GANs. An application like Deepsing is ingenious, but it shouldn’t shock the world as much as other AI products like GPT and GANs DeepFake.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.