MusicBERT: Microsoft’s Large Scale Pre-Trained Model For Symbolic Music Understanding

Share

Published on June 22, 2021

by Amit Raja Naik

Microsoft recently developed a large scale pre-trained model for symbolic music understanding called MusicBERT. Symbolic music understanding refers to understanding music from the symbolic data (for example, MIDI format). It covers many music applications such as emotion classification, genre classification, and music pieces matching.

For developing MusicBERT, Microsoft has used OctupleMIDI method, bar-level masking strategy, and a large scale symbolic music corpus of more than 1 million music tracks.

Why OctupleMIDI?

OctupleMIDI is a novel music encoding method that encodes each note into a tuple with eight elements, representing the different aspects of the characteristics of a musical note, including instrument, tempo, bar, position, time signature, pitch, duration, and velocity.

Here are some of the advantages of OctupleMIDI:

Reduces the length of a music sequence (4x shorter than REMI), thus easing the modelling of music sequences by Transformer considering that music sequences themselves are very long
It is ‘note’ centric. Since each note contains the same eight tuple structure and covers adequate information to express various music genres, like time signature, long note duration, etc., OctupleMIDI is much easier.
It is universal compared to previous encoding methods since each note contains the 8-tuple structure to express different music genres.

Different encoding methods for symbolic music understanding (Source: arXiv)

MusicBERT architecture

The authors of the study established that it was challenging to apply NLP directly to symbolic music because it differs greatly from natural text data. There are following challenges:

Music songs are more structural and diverse, making it more difficult to encode as compared to natural language.
Due to complicated encoding of symbolic music, there are higher chances of information leakage in pre-training
The pre-training for music understanding is limited due to lack of large-scale symbolic music corpora

To remediate this, researchers Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, and Tie-Yan Liu have developed MusicBERT, a large-scale pre-trained model with music encoding and masking strategy for music understanding. This model evaluates symbolic music understanding tasks, including melody completion, accompaniment suggestion, style classification and genre classification.

Besides OctupleMIDI, MusicBERT uses a bar-level masking strategy. The masking strategy in original BERT for NLP tasks randomly masks some tokens, causing information leakage in music pre-training. However, in the bar-level masking strategy used in MusicBERT, all the tokens of the same type (for example, time signature, instruments, pitch, etc.) are masked in a bar to avoid information leakage and for representational learning.

In addition to this, MusicBERT also uses a large-scale and diverse symbolic music dataset, called the million MIDI dataset (MMD). It contains more than 1 million music songs, with different genres, including Rock, Classical, Rap, Electronic, Jazz, etc. It is one of the most extensive datasets in current literature — ten times larger than the previous largest dataset LMD in terms of the number of songs (148,403 songs and 535 million notes). MMD has about 1,524,557 songs and two billion notes. This dataset benefits representation learning for music understanding significantly.

Model structure of MusicBERT (Source: arXiv)

Further, the model is fine tuned on four tasks like melody completion, accompaniment suggestion, style classification and genre classification against a few baseline models such as melody2vec, tonnetz, pianoroll, PiRhDy and others. MusicBERT shows tremendous improvement for both small as well as baseline models.

The below table shows the results of MusicBERT versus other models.

(Source: arXiv)

Conclusion

MusicBERT achieves state-of-the-art performance on all four evaluated symbolic music understanding tasks. In the coming months, the team will attempt applying MusicBERT on other tasks such as structure analysis and chord recognition to boost the model’s performance.

Access all our open Survey & Awards Nomination forms in one place

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.