MITB Banner

Meta Releases StableDiffusion For Music, Musicgen

When compared to other music models such as Riffusion, Mousai, MusicLM, and Noise2Music, MusicGen demonstrates superior performance in objective and subjective metrics.

Share

Listen to this story

Meta’s MusicGen is an AI model that utilises the Transformer architecture to generate new pieces of music based on text prompts. It has the capability to align the generated music with existing melodies, providing a versatile and creative approach to music composition.

Similar to language models, MusicGen predicts the next section in a piece of music rather than the next characters in a sentence. This enables it to generate coherent and structured musical compositions.

The audio data used for training is decomposed into smaller components using Meta’s EnCodec audio tokeniser. This approach allows the model to process tokens in parallel, making it efficient and fast in generating music.

The training process involved utilising a dataset of 20,000 hours of licensed music, including 10,000 high-quality music tracks from an internal dataset, as well as music data from Shutterstock and Pond5. This extensive training dataset ensures that MusicGen has access to a diverse range of musical styles and compositions.

One of the key features of MusicGen is its ability to handle both text and music prompts. The text prompt sets the basic style, which is then matched with the melody from the audio file. For example, by combining a text prompt describing a specific style of music with the melody of a famous composition, MusicGen can generate a new piece of music that reflects the desired style.

It is important to note that while MusicGen can provide a rough guideline for generating music based on a specific prompt, it does not offer precise control over the orientation to the melody or the ability to hear a melody in different styles. The generated output serves as a creative interpretation rather than an exact replication.

In terms of performance, the researchers experimented with different sizes of the model, ranging from 300 million to 3.3 billion parameters. They found that larger models generally produced higher quality audio, but the 1.5 billion parameter model was rated the best by human evaluators. The 3.3 billion parameter model excelled in accurately matching text input with audio output.

When compared to other music models such as Riffusion, Mousai, MusicLM, and Noise2Music, MusicGen demonstrates superior performance in objective and subjective metrics, which evaluate the match between music and lyrics as well as the plausibility of the composition. Overall, MusicGen ranks higher than Google’s MusicLM—and it could very well be the StableDiffusion Moment for Music.
Meta has released the code and models of MusicGen as open source on GitHub, allowing researchers and commercial users to access and utilise the technology. This move encourages further development, collaboration, and innovation in the field of AI-generated music. A demo of MusicGen is also available on the Huggingface platform, providing a hands-on experience of its capabilities.

Share
Picture of Shyam Nandan Upadhyay

Shyam Nandan Upadhyay

Shyam is a tech journalist with expertise in policy and politics, and exhibits a fervent interest in scrutinising the convergence of AI and analytics in society. In his leisure time, he indulges in anime binges and mountain hikes.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India