Generating Piano Music With Score2Perf


Making music is the process of putting different sounds and tones in a particular order, often even combining them to create a unified composition. Music, in general, is made of sounds, vibrations, and silent moments, and it doesn’t always have to be pleasant or pretty to listen. It can be used vividly to convey a whole range of experiences, environments, and emotions. Our Human culture, in particular, has always had a tradition of making music. Early instruments like flutes and drums have been found dating back thousands of years. Ancient civilizations used music in religious ceremonies. Many other cultures around the world have traditions related to drumming for important rituals. The 21st Century has experienced rock and pop musicians tour and perform worldwide, singing the songs that made them famous. 

But what if I tell you that, to create and compose music in the coming days, human involvement might no longer be required! Scary, Isn’t it? The sole idea that Artificial Intelligence alone can compose music is scary and unacceptable to many people. But, music-making AI software has developed and advanced so much in the past few years that it’s no longer a frightening thought anymore. Entire industries are being built around AI services for creating music, most of which work by utilizing deep learning networks, a type of AI that relies on analyzing large amounts of data. Feed the software tons of source material, from which it then analyzes to find patterns. For example, it notices music building components like chords, tempo, length, and how each note relates to one another, gradually learning from all the input to write and compose its own melodies. 

With this said, it raises a very important question: could artificial intelligence one day replace musicians? Will all the melodies in the distant future be created by software living inside machines?  Although the use of AI to make music or aid musicians has been in practice for quite some time. Even in the ’90s, English singer-songwriter-actor, David Bowie, used an application known as the Verbasizer, which took the original lyric source material as input and randomly reordered them to create new combinations of words, therefore creating all-new lyrical compositions!

About Music Transformers 

Creating long musical pieces automatically can be challenging, as music consists of a certain structure that uses multiple timescales, from editing something as minute as a millisecond timing, phrases, repetition of entire structures. For something as complex as this, Music Transformers can come to aid. Music Transformers are a collection of attention-based neural networks that can generate music with improved long term coherence. It makes use of an event-based representation that allows it to generate highly expressive compositions and performances directly, without first generating a music score. In contrast to LSTM-based models like Performance RNN that compress past events into a fixed-size hidden state, Music Transformers use a transformer-based model with direct access to all earlier events.   

The traditional transformer only captures self-reference through attention. It relies on timing signals, thus having a hard time keeping track of regularity based on relative distances, event orderings, and periodicity. On the other hand, Music Transformer makes use of relative attention, which modulates attention based on how far apart the tokens to be compared are. Hence, the model can focus more on relational features. The relative self-attention feature also allows the model to generalize beyond the length of the training samples, which is not possible with the traditional transformer models. Its new algorithm for relative self-attention dramatically reduces the memory footprint, allowing it to scale to musical sequences by order of minutes. The Music Transformer model can also skip over less relevant sections, creating state of the art compositions.

What is Score2Perf?

Score2Perf is a Music Transformer library that converts input scores into total compositions by using attention-based features and analyzing the input signals of a music score. The library is pre-trained on thousands of musical recordings and transcriptions to train symbolic music models on a representation that contains the performance characteristics from the original recordings. The library can control such models in a few different ways or just generate new performances from scratch. 

Getting Started with Generating Piano Music with Score2Perf

We will try to create an automated Piano Music generator using Music Transformers and Score2Perf. Then, we will import a Piano Music Dataset from the library itself that contains piano scores and Train our Model to generate a music composition from scratch. 

The following code implementation is in reference to the official implementation, whose tutorial you can find here

Setting Up The Environment

We will first start with installing the dependencies to generate piano music; for this, we will first install the primers essential for composition from magenta, an open-source library powered by TensorFlow and call our piano sound primer and music synthesizers. 

 #Importing Primers
 print('Copying Salamander piano SoundFont (via from GCS...')
 !gsutil -q -m cp -r gs://magentadata/models/music_transformer/primers/* /content/
 !gsutil -q -m cp gs://magentadata/soundfonts/Yamaha-C5-Salamander-JNv5.1.sf2 /content/
 print('Installing dependencies...')
 !apt-get update -qq && apt-get install -qq libfluidsynth1 build-essential libasound2-dev libjack-dev
 !pip install -q 'tensorflow-datasets < 4.0.0'
 !pip install -qU google-cloud magenta pyfluidsynth 
Importing the Libraries

Importing our score2perf library and its modules,

 #Importing Libraries
 import numpy as np
 import os
 import tensorflow.compat.v1 as tf
 from google.colab import files
 from tensor2tensor import models
 from tensor2tensor import problems
 from tensor2tensor.data_generators import text_encoder
 from tensor2tensor.utils import decoding
 from tensor2tensor.utils import trainer_lib
 from magenta.models.score2perf import score2perf
 import note_seq 
Setting up our Encoder & Decoder

We will first encode the input music sequence into tokens using the tokenizer, then generate a new sequence of tokens from the Music Transformer model and then decode the generated tokens into a sequence of motifs using the tokenizer again, which will provide us with our output. We will also provide it with parameters such as sample rate, which will define the speed of the output sequence. 

 #Setting Path and sample rate of music
 SF2_PATH = '/content/Yamaha-C5-Salamander-JNv5.1.sf2'
 SAMPLE_RATE = 16000
 # Upload a MIDI file and convert to NoteSequence.
 def upload_midi():
   data = list(files.upload().values())
   if len(data) > 1:
     print('Multiple files uploaded; using only one.')
   return note_seq.midi_to_note_sequence(data[0])
 # Decode a list of IDs.
 def decode(ids, encoder):
   ids = list(ids)
   if text_encoder.EOS_ID in ids:
     ids = ids[:ids.index(text_encoder.EOS_ID)]
   return encoder.decode(ids)
 Creating a Piano Performance Language Model
 Set up the generation by fixing the parameters from an unconditional Transformer model.
 #setting the transformer
 model_name = 'transformer'
 hparams_set = 'transformer_tpu'
 ckpt_path = 'gs://magentadata/models/music_transformer/checkpoints/unconditional_model_16.ckpt'
 class PianoPerformanceLanguageModelProblem(score2perf.Score2PerfProblem):
   def add_eos_symbol(self):
     return True
 problem = PianoPerformanceLanguageModelProblem()
 unconditional_encoders = problem.get_feature_encoders()
 # Set up HParams.
 hparams = trainer_lib.create_hparams(hparams_set=hparams_set)
 trainer_lib.add_problem_hparams(hparams, problem)
 hparams.num_hidden_layers = 16
 hparams.sampling_method = 'random'
 # Set up decoding HParams.
 decode_hparams = decoding.decode_hparams()
 decode_hparams.alpha = 0.0
 decode_hparams.beam_size = 1
 # Create an Estimator.
 run_config = trainer_lib.create_run_config(hparams)
 estimator = trainer_lib.create_estimator(
     model_name, hparams, run_config,
 # Create input generator (so we can adjust priming and
 # decode length on the fly).
 def input_generator():
   global targets
   global decode_length
   while True:
     yield {
         'targets': np.array([targets], dtype=np.int32),
         'decode_length': np.array(decode_length, dtype=np.int32)
 # These values will be changed by subsequent cells.
 targets = []
 decode_length = 0
 # Start the Estimator, loading from the specified checkpoint.
 input_fn = decoding.make_input_fn_from_generator(input_generator())
 unconditional_samples = estimator.predict(
     input_fn, checkpoint_path=ckpt_path)
 # "Burn" one.
 _ = next(unconditional_samples) 
Generate from Performance Scratch

We will now generate a piano performance from scratch. This might take a minute or so, depending on the length of the performance the model ends up generating. Because we use an RNN model where each event corresponds to a variable amount of time, the actual number of seconds generated may vary.

 #setting the decode length
 targets = []
 decode_length = 1024
 # Generate sample events.
 sample_ids = next(unconditional_samples)['outputs']
 # Decode to NoteSequence.
 midi_filename = decode(
 unconditional_ns = note_seq.midi_file_to_note_sequence(midi_filename)
 # Play and plotting into the model
     synth=note_seq.fluidsynth, sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)

We will get the following player as an output,

Choosing the Priming Sequence

Here you can choose a priming sequence to be continued by the model. The environment has already provided a few, or you can upload your Musical Instrument Digital Interface or MIDI file.

Set max_primer_seconds below to trim the primer to a fixed number of seconds (this will have no effect if the primer is already shorter than max_primer_seconds).

 #importing our piano music scales
 filenames = {
     'C major arpeggio': '/content/c_major_arpeggio.mid',
     'C major scale': '/content/c_major_scale.mid',
     'Clair de Lune': '/content/clair_de_lune.mid',
 primer = 'C major scale'  #@param ['C major arpeggio', 'C major scale', 'Clair de Lune', 'Upload your own!']
 if primer == 'Upload your own!':
   primer_ns = upload_midi()
   # Use one of the provided primers.
   primer_ns = note_seq.midi_file_to_note_sequence(filenames[primer])
 # Handle sustain pedal in the primer.
 primer_ns = note_seq.apply_sustain_control_changes(primer_ns)
 # Trim to desired number of seconds.
 max_primer_seconds = 20  #@param {type:"slider", min:1, max:120}
 if primer_ns.total_time > max_primer_seconds:
   print('Primer is longer than %d seconds, truncating.' % max_primer_seconds)
   primer_ns = note_seq.extract_subsequence(
       primer_ns, 0, max_primer_seconds)
 # Remove drums from the primer if present.
 if any(note.is_drum for note in primer_ns.notes):
   print('Primer contains drums; they will be removed.')
   notes = [note for note in primer_ns.notes if not note.is_drum]
   del primer_ns.notes[:]
 # Set primer instrument and program.
 for note in primer_ns.notes:
   note.instrument = 1
   note.program = 0
 # Play and plot the primer.
     synth=note_seq.fluidsynth, sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)

Making AI Able to Continue the piano performance, starting with the chosen priming sequence.

 #setting the target encoder
 targets = unconditional_encoders['targets'].encode_note_sequence(
 # Remove the end token from the encoded primer.
 targets = targets[:-1]
 decode_length = max(0, 4096 - len(targets))
 if len(targets) >= 4096:
   print('Primer has more events than maximum sequence length; nothing will be generated.')
 # Generate sample events.
 sample_ids = next(unconditional_samples)['outputs']
 # Decode to NoteSequence.
 midi_filename = decode(
 ns = note_seq.midi_file_to_note_sequence(midi_filename)
 # Append continuation to primer.
 continuation_ns = note_seq.concatenate_sequences([primer_ns, ns])
 # Play and plot.
     synth=note_seq.fluidsynth, sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)
Choose the Melody

Here you can choose a melody to be accompanied by the model. We have provided a few, or you can upload a MIDI file; if your MIDI file is polyphonic, the notes with the highest pitch will be used as the melody. For example, we have utilized the melody for a rhyme called twinkle twinkle little star.

 # Tokens to insert between melody events.
 event_padding = 2 * [note_seq.MELODY_NO_EVENT]
 # setting melody sequence
 melodies = {
     'Mary Had a Little Lamb': [
         64, 62, 60, 62, 64, 64, 64, note_seq.MELODY_NO_EVENT,
         62, 62, 62, note_seq.MELODY_NO_EVENT,
         64, 67, 67, note_seq.MELODY_NO_EVENT,
         64, 62, 60, 62, 64, 64, 64, 64,
         62, 62, 64, 62, 60, note_seq.MELODY_NO_EVENT,
         note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT
     'Row Row Row Your Boat': [
         60, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         60, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         60, note_seq.MELODY_NO_EVENT, 62,
         64, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         64, note_seq.MELODY_NO_EVENT, 62,
         64, note_seq.MELODY_NO_EVENT, 65,
         67, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         72, 72, 72, 67, 67, 67, 64, 64, 64, 60, 60, 60,
         67, note_seq.MELODY_NO_EVENT, 65,
         64, note_seq.MELODY_NO_EVENT, 62,
         60, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT,
         note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT, note_seq.MELODY_NO_EVENT
     'Twinkle Twinkle Little Star': [
         60, 60, 67, 67, 69, 69, 67, note_seq.MELODY_NO_EVENT,
         65, 65, 64, 64, 62, 62, 60, note_seq.MELODY_NO_EVENT,
         67, 67, 65, 65, 64, 64, 62, note_seq.MELODY_NO_EVENT,
         67, 67, 65, 65, 64, 64, 62, note_seq.MELODY_NO_EVENT,
         60, 60, 67, 67, 69, 69, 67, note_seq.MELODY_NO_EVENT,
         65, 65, 64, 64, 62, 62, 60, note_seq.MELODY_NO_EVENT        
 melody = 'Twinkle Twinkle Little Star'  
 if melody == 'Upload your own!':
   # Extract melody from user-uploaded MIDI file.
   melody_ns = upload_midi()
   melody_instrument = note_seq.infer_melody_for_sequence(melody_ns)
   notes = [note for note in melody_ns.notes
            if note.instrument == melody_instrument]
   del melody_ns.notes[:]
       sorted(notes, key=lambda note: note.start_time))
   for i in range(len(melody_ns.notes) - 1):
     melody_ns.notes[i].end_time = melody_ns.notes[i + 1].start_time
   inputs = melody_conditioned_encoders['inputs'].encode_note_sequence(
   # Use one of the provided melodies.
   events = [event + 12 if event != note_seq.MELODY_NO_EVENT else event
             for e in melodies[melody]
             for event in [e] + event_padding]
   inputs = melody_conditioned_encoders['inputs'].encode(
       ' '.join(str(e) for e in events))
   melody_ns = note_seq.Melody(events).to_sequence(qpm=150)
 # Play and plot the melody.
     synth=note_seq.fluidsynth, sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)
Improving Melody with Accompaniment 

Generate a piano performance consisting of the chosen melody plus accompaniment.

 # Generate sample events.
 decode_length = 4096
 sample_ids = next(melody_conditioned_samples)['outputs']
 # Decode to NoteSequence.
 midi_filename = decode(
 accompaniment_ns = note_seq.midi_file_to_note_sequence(midi_filename)
 # Play and plot.
     synth=note_seq.fluidsynth, sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)

And our Final Output stands to be

Download the output as a MIDI file

Download accompaniment performance as MIDI file.

 #Download accompaniment performance as MIDI (optional).
     accompaniment_ns, '/tmp/accompaniment.mid')'/tmp/accompaniment.mid') 

Your output file will now be ready for download to the path you choose to store.

The colab notebook for the above code is available here.


This article has created a model that can generate composed audio from the sound wave of scores. You can also try creating sounds other than piano and explore the Music Transformer further by using hyperparameter tuning.

Happy Learning!


Download our Mobile App

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox