MITB Banner

Now You Can Generate Music From Scratch With OpenAI’s Neural Net Model

Share

One of the popular AI research labs, OpenAI has been working tremendously in the domain of artificial intelligence, particularly on the grounds of neural networks, reinforcement learning, among others. Just a few days back, the AI lab introduced Microscope for AI enthusiasts who are interested in exploring how neural network work.

And now the audio team of OpenAI has introduced a new machine learning model known as Jukebox that generates music while singing in the raw audio domain. This AI model is fed with genre, artist, and lyrics as input to generate new music samples that are produced from scratch. 

Over the past few years, generative modelling has made various groundbreaking progress. One of the crucial goals of generative modelling is to capture the important features of the data and create new instances that are indistinguishable from the true data. 

In this work, the researchers used the state-of-the-art deep generative models to produce a single system capable of generating diverse high-fidelity music in the raw audio domain with long-range coherence spanning multiple minutes. The researchers stated, “We chose to work on music because we want to continue to push the boundaries of generative models.”

Behind Jukebox

Jukebox is a neural network model that generates music, including rudimentary singing, as raw audio in a variety of genres and artist’s styles. Unlike other music generator models, this neural net model follows a different approach, which is to model music directly as raw audio. Generating music at the audio level is usually challenging due to the very long sequences.

One of the ways of diminishing the issue of long input is to use an autoencoder that will compress raw audio to a lower-dimensional space by discarding some of the perceptually irrelevant bits of information. Jukebox’s autoencoder model compresses audio to a discrete space, using a quantisation-based approach called VQ-VAE.

VQ-VAE is an approach of downsampling extremely long context inputs to a shorter-length discrete latent encoding using vector quantisation. The model uses a hierarchical VQ-VAE architecture for compressing audio into a discrete space, along with a loss function designed to retain the maximum amount of musical information. 

According to the researchers, while the previous work has generated raw audio music in the 20–30 second range, this new neural net model is capable of generating pieces that are multiple minutes long, and with recognisable singing in natural-sounding voices.

Dataset Used

To train the Jukebox model, the researchers crawled the web to curate a new dataset of 1.2 million songs, from which 600,000 were in English. Following this, it was paired with the corresponding lyrics and metadata from LyricWiki, where the metadata includes artist, album genre, and year of the songs, along with common moods or playlist keywords associated with each song. The model is further trained on 32-bit, 44.1 kHz raw audio and data augmentation are performed by randomly downmixing the right and left channels to produce mono audio.    

Limitations of This Model

The researchers mentioned that there is a significant gap between music generations and human-created music. Some of the limitations are mentioned below:

  • The generated songs show a variety of features such as local musical coherence, feature impressive solos and traditional chord patterns, but it lacks familiar larger musical structures such as choruses that usually repeat in a song
  • The downsampling and upsampling process introduces discernable noise. However, improving the VQ-VAE to capture more musical information would help reduce this issue
  • Because of the autoregressive nature of sampling, the performance of the model is slower. According to the researchers, it takes approximately 9 hours to fully render one minute of audio through our models, and thus they cannot yet be used in interactive applications
  • Currently, the model is only trained in English and mostly western lyrics, songs in other languages are yet to be trained

Wrapping Up

OpenAI has been working on generating automatic audio samples conditioned on different kinds of priming information for a few years now. With the creation of Jukebox, the researchers hope that it will improve the musicality of samples with unique lyrics, and thus providing a way of giving musicians more control over the generations. They have released the model weights and code, including a tool that will help in exploring the generated samples.  

This is not the first time that the San Francisco-based AI research laboratory applied AI to create music. Last year, OpenAI introduced MuseNet, which is a deep neural network that can generate 4-minute musical compositions with 10 different instruments and combine styles from country to Mozart and the Beatles.

Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India