MITB Banner

MusicBERT: Microsoft’s Large Scale Pre-Trained Model For Symbolic Music Understanding

Share

Microsoft recently developed a large scale pre-trained model for symbolic music understanding called MusicBERT. Symbolic music understanding refers to understanding music from the symbolic data (for example, MIDI format). It covers many music applications such as emotion classification, genre classification, and music pieces matching. 

For developing MusicBERT, Microsoft has used OctupleMIDI method, bar-level masking strategy, and a large scale symbolic music corpus of more than 1 million music tracks. 

Why OctupleMIDI?

OctupleMIDI is a novel music encoding method that encodes each note into a tuple with eight elements, representing the different aspects of the characteristics of a musical note, including instrument, tempo, bar, position, time signature, pitch, duration, and velocity. 

Here are some of the advantages of OctupleMIDI: 

  • Reduces the length of a music sequence (4x shorter than REMI), thus easing the modelling of music sequences by Transformer considering that music sequences themselves are very long 
  • It is ‘note’ centric. Since each note contains the same eight tuple structure and covers adequate information to express various music genres, like time signature, long note duration, etc., OctupleMIDI is much easier. 
  • It is universal compared to previous encoding methods since each note contains the 8-tuple structure to express different music genres.

Different encoding methods for symbolic music understanding (Source: arXiv) 

MusicBERT architecture 

The authors of the study established that it was challenging to apply NLP directly to  symbolic music because it differs greatly from natural text data. There are following challenges:

  • Music songs are more structural and diverse, making it more difficult to encode as compared to natural language.
  • Due to complicated encoding of symbolic music, there are higher chances of information leakage in pre-training
  • The pre-training for music understanding is limited due to lack of large-scale symbolic music corpora

To remediate this, researchers Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu have developed MusicBERT, a large-scale pre-trained model with music encoding and masking strategy for music understanding. This model evaluates symbolic music understanding tasks, including melody completion, accompaniment suggestion, style classification and genre classification. 

Besides OctupleMIDI, MusicBERT uses a bar-level masking strategy. The masking strategy in original BERT for NLP tasks randomly masks some tokens, causing information leakage in music pre-training. However, in the bar-level masking strategy used in MusicBERT, all the tokens of the same type (for example, time signature, instruments, pitch, etc.) are masked in a bar to avoid information leakage and for representational learning.  

In addition to this, MusicBERT also uses a large-scale and diverse symbolic music dataset, called the million MIDI dataset (MMD). It contains more than 1 million music songs, with different genres, including Rock, Classical, Rap, Electronic, Jazz, etc. It is one of the most extensive datasets in current literature — ten times larger than the previous largest dataset LMD in terms of the number of songs (148,403 songs and 535 million notes). MMD has about 1,524,557 songs and two billion notes. This dataset benefits representation learning for music understanding significantly. 

Model structure of MusicBERT (Source: arXiv) 

Further, the model is fine tuned on four tasks like melody completion, accompaniment suggestion, style classification and genre classification against  a few baseline models such as melody2vec, tonnetz, pianoroll, PiRhDy and others. MusicBERT shows tremendous improvement for both small as well as baseline models. 

The below table shows the results of MusicBERT versus other models.

(Source: arXiv)

Conclusion 

MusicBERT achieves state-of-the-art performance on all four evaluated symbolic music understanding tasks. In the coming months, the team will attempt applying MusicBERT on other tasks such as structure analysis and chord recognition to boost the model’s performance. 

Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.