Now Reading
Can Word2Vec Model Spill The Secrets Of Mozart’s Music?

Can Word2Vec Model Spill The Secrets Of Mozart’s Music?

Ram Sagar

It is genius because it can neither be decoded nor mimicked. It can be remade, copied but can never again be unique. To laymen, the works of great composers falls somewhere between their mood swings and status signaling. But one cannot, in all their seriousness, comprehend what went in to writing the “Requiem for a dream” or why Tchaikovsky fired cannons in his 1812 overtures.

The turn of this century witnessed a new form of intelligence. An augmentation of human idea with the computational powers of the machines.These machines, now have become massive data driven engine. With every innovation in the algorithms, the machines got better.

Machine learning models like neural networks devour upon tonnes of data and churns out results without getting tired. These algorithms have enabled us to see the hidden correlations, which in turn have been exploited for building better models.

Word2vec is one such ingenious model used to explore the semantic relationship amongst the words in a document, a movie review or a President’s speech.

A Brief Look At Word2Vec

Word2Vec  embeds words in a lower-dimensional vector space using a shallow neural network. The result is a set of word-vectors where vectors close together in vector space have similar meanings based on context, and word-vectors distant to each other have differing meanings. For example, apple and orange would be close together and apple and gravity would be relatively far. There are two versions of this model based on skip-grams (SG) and continuous-bag-of-words (CBOW).

The potential of word2vec is well documented with regards to the work done with text. Now researchers from the University of Miami, USA and Singapore University of Technology have used word2vec to capture meaningful relationships in complex polyphonic music in high dimensional vector space. For example knowing how statistical properties of audio influence emotional responses of the listeners.

The widely popular cosine distance is used as a metric to distinguish between functional and chord relationships as well as the harmonic associations in the music.

Usually while building word2vec models, continuous bag of words and skip-gram are the popular approaches.

Though music lacks the referential semantics of language, the motivation to experiment with word2vec comes from the sequential nature in which events occur in a music piece. Just like grammar, these events can be seen as rules followed by a certain kind of music. Human expression, be it speech or song can have sophistications but they have a structural property at the lower levels. And, the researchers aimed to exploit this.

Shredding And Slicing

Speech can change from land to land. The dialects and intonation vary giving a stark contrast between word usage. In case of a song, different instruments can be used to produce the same sound. So a song can consist of simultaneously produced sound, So to cater for both the sequential and simultaneous nature, the researchers propose ‘musical slice’, the smallest unit of music. This slice is based upon factors like time signature of certain sound, number of tones etc.

Based on this, the corpus used for modeling is segmented into equal-length slices.

Source: Paper by  Chuan et al (2018)

The above figure illustrates the slices represented on a sample pieces by one of the greatest composers, Chopin.

Dataset used for this experiment is the MIDI which contains a total of 130,000 pieces from eight genres which include classical, metal, etc.

Just like in NLP models, how frequently occurring words are weighted more, modeling in this case was done on 500 most occurring slices(musical words) out of a total of 4,076 unique slices and the not so frequent slices are replaced with dummy word.

Learning rate = 0.1

Skip window size =  4

Number of training steps = 1,000,000

Number of dimensions = 256

See Also
data opinion mining ai nlp

Written music consists of labels like C major, G major along with Roman numerals like IV. This is done to indicate the functional role of a particular key on the instrument.

So, here Word2vec model was used to examine the distance between these representations. To do this, the chords are mapped to its pitch classes to find corresponding slices. The geometrical position of this slice is then identified in the word2vec vector space. In this way the cosine distance is calculated for a pair of chords.

The cosine distance are mapped as a heat map where colors are used to represent the distances. For example, red/orange means high distance. This pattern occurs due to the arrangement of the keys on the instrument, which can be verified with the already available music theory.

Like analogies in language, word2vec demonstrates the functional relationships amongst the chords. C major triad is to G major triad as G is to D. These insights were drawn from the model which was trained on attributes like pitch of the chords, musical words converted into slices in a high dimensional vector space.

“It should be noted that our aim was not to create a full-fledged music generation system, but rather, to illustrate  how word2vec might be useful in a music generation context,” observe the researchers in their paper.

Word2vec was popular with word embeddings and NLP tasks until today. Now this experiment proves the model’s reach in domains, unforeseen by the creators. Experiments such as these are not to make a mock human creativity but to acknowledge how deeply complex it is and also to continue investigating the outreach of algorithmic driven thinking, which can lead us to the road less travelled.

Read more about the experiment here.

What Do You Think?

If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top