Max Planck Releases Moûsai for Text-to-Music Synthesis

Moûsai can generate long-context, high-quality stereo music at 48kHz.
Listen to this story

German research lab Max Planck Institute recently released a research paper for Moûsai, a text-to-music model to generate long-context high-quality 48kHz stereo music beyond the minute-mark based on context exceeding the minute-mark and generate various music. 

The team came up with a new, more efficient way to generate real-time audio. They created a 1D U-Net architecture that can run on a single consumer GPU. This means that it can be trained and run even in universities that don’t have access to huge resources.

The team also introduced a new diffusion magnitude autoencoder to shrink the audio signal 64 times smaller while still keeping the quality mostly the same. This tool is used in the new architecture’s generation stage to improve the audio sound.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Read the full paper here

Generating music involves multiple elements such as temporal dimension, long-term structure, multiple sound layers, and subtleties that only trained ears can pick up.


Download our Mobile App



Joining Meta, last week, big tech Google also unveiled MusicLM, a generative model for creating high-fidelity music from text descriptions, such as “a calming violin melody supported by a distorted guitar riff”. MusicLM makes music at 24 kHz that holds steady for several minutes by modelling the process of conditional music synthesis as a hierarchical sequence-to-sequence modelling problem. 

Read more: Google Unveils MusicLM, a Music DALL-E

Diffusion models are becoming increasingly popular. They’re not just used for images anymore. With the power of these models, anything can be created from text — videos, speech, and even music.

Music synthesis is the latest arena for diffusion models. While there has been some progress, there’s still much more to discover and explore in this exciting field.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Shritama Saha
Shritama is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR