Last updated February 26, 2021
In AI Mysteries

Hands-on Guide To GANSynth: An Adversarial Neural Audio Synthesis Technique

Share

Published on February 26, 2021

by Nikita Shiledarbaxi

GANSynth is a state-of-the-art method for synthesizing high-fidelity and locally coherent audio using Generative Adversarial Networks (GANs). Hence the name GANSynth (GAN used for audio Synthesis). It was introduced by Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue and Adam Roberts – researchers at the Google AI in 2019 (research paper).

Autoregressive models like WaveNets generate audio sequentially. On the contrary, GANSynth creates the whole sequence in parallel, synthesizing audio much faster on GPU runtime than real-time synthesis. It generates the entire audio clip from a single latent vector, allowing for easier release of global features like pitch and timbre (tone quality). It uses progressive GAN architecture. It eliminates the drawback of traditional GANs which struggle to synthesize locally coherent audio waveforms though they use global latent conditioning and efficient parallel sampling.

Are you interested in understanding the detailed workings of GANSynth? Refer to this page before proceeding!

Practical Implementation of GANSynth

Here’s a demonstration of how GANSynth learns to produce musical notes of individual instruments as contained in the NSynth dataset (a large-sized qualitative dataset having annotated notes). The GAN learns to use its latent space for representing various instrument timbres. It synthesizes audio from MIDI files and interpolates between different instruments. The code has been implemented in Google colab using Python version 3.7.10.

Step-wise explanation of the code is as follows:

Install Magenta (an open-source Python library, powered by Tensorflow)

 #Copy data from the GCS (Google Cloud Storage)
 !rm -r /content/gansynth &>/dev/null
 !mkdir /content/gansynth
 !mkdir /content/gansynth/midi
 !mkdir /content/gansynth/samples
 # Load default MIDI (Bach Prelude)
 #’curl’ command enables fetching a given URL 
 !curl -o /content/gansynth/midi/bach.mid http://www.jsbach.net/midi/cs1- 
 1pre.mid

-o option provided with the curl command saves the downloaded file on your local machine with the name specified as the parameter.

 SONG = '/content/gansynth/midi/bach.mid'
 !curl -o /content/gansynth/midi/riff-default.mid 
 http://storage.googleapis.com/magentadata/papers/gansynth/midi/arp.mid
 RIFF = '/content/gansynth/midi/riff-default.mid'
 !pip install -q -U magenta

Import required libraries and classes

 import os #module for interacting with the operating system
 #To load files from local device (weblink)
 from google.colab import files 
 import librosa #Python library for music and audio analysis
 from magenta.models.nsynth.utils import load_audio
 from magenta.models.gansynth.lib import flags as lib_flags
 from magenta.models.gansynth.lib import generate_util as gu
 from magenta.models.gansynth.lib import model as lib_model
 from magenta.models.gansynth.lib import util
 import matplotlib.pyplot as plt #for visualization
 import note_seq
 from note_seq.notebook_utils import colab_play as play
 #colab_play() inserts an HTML audio widget to play a sound in colab
 import numpy as np
 import tensorflow.compat.v1 as tf
 #disable_v2_behavior() switches all global behaviors which vary between  
 #tensorflow 1.x and 2.x versions to behave as in 1.x.
 tf.disable_v2_behavior()

Define a function for uploading .wav file

 def upload():
   map = files.upload() #Upload the file 
   list = [] Initialize list to store names of uploaded files
 #Use iteritems() to iterate over key-value pairs of the dictionary of uploaded file content
   for key, val in map.iteritems():
     filename = os.path.join('/content/gansynth/midi', key)
     with open(filename, 'w') as file: #open the file in write mode
#write the content of uploaded file to the specified file
       file.write(val) 
       print('Writing the file {}'.format(filename))
      list.append(filename) #Add the filename to the list 
   return list

Define global variables

 #checkpoint directory
 CHECKPOINT_DIR = 'gs://magentadata/models/gansynth/acoustic_only'
 OP_DIR = '/content/gansynth/samples' #output directory
 BATCH_SIZE = 16
 SR = 16000 #SR stands for Sample Rate

Create an output directory if it does not exist

 #Expand the path of parent directory using expand_path()
 OP_DIR = util.expand_path(opdir)
 #tensorflow.gfile.Exists() determines existence of a file
 if not tf.gfile.Exists(OP_DIR):
 #Create a directory using tensorflow.gfile.MakeDirs()
   tf.gfile.MakeDirs(OP_DIR)

Load the model

 #Clear the default graph stack and reset the global default graph
 tf.reset_default_graph() 
 myflags = lib_flags.Flags({
 #Dictionary for storing and accessing flags
     'batchSizeSchedule': [BATCH_SIZE],
     'tfdsData': "gs://tfds-data/datasets",
 })
 #Create a GAN model using flags and weights from a saved model
 model = lib_model.Model.load_from_path(CHECKPOINT_DIR, myflags)

Define a function for loading MIDI file as a notesequence

 def midiLoad(path, minimumPitch=36, maximumPitch=84):
   midiPath = util.expand_path(path) #Expand the directory path
   noteSequence = note_seq.midi_file_to_sequence_proto(midiPath)
 #Define NumPy arrays to store pitches, velocities, start and end   
 #times of each note
   pitches = np.array([n.pitch for n in noteSequence.notes])
   velo = np.array([nt.velocity for nt in noteSequence.notes])
   startTimes = np.array([nt.start_time for nt in 
   noteSequence.notes])
   endTimes = np.array([nt.end_time for nt in noteSequence.notes])
 #Keep only the notes in required pitch range
   valid = np.logical_and(pitches >= minimumPitch, pitches <= 
   maximumPitch)
 #Store the valid notes’ features in the form of a dictionary
   notes = {'pitches': pitches[valid],
            'velocities': velo[valid],
            'startTimes': startTimes[valid],
            'endTimes': endTimes[valid]}
   return noteSequence, notes

Create an attack, sustain and release amplitude envelope (these are the stages of envelope generator)

‘Attack’ is part of the envelope which represents time taken by the amplitude to reach its peak.’Sustain’ is the duration for which sound is held before it fades out.’Release’ is the final reduction in amplitude over time.

 def createEnvelope(note_length, attack=0.010, release=0.3, sr=16000): 
#sr means sample rate
   note_len = min(note_length, 3.0)
   attack = int(sr * attack)
   sustain = int(sr * note_len)
   release = int(sr * release)
   total = sustain + release  #attack envelope doesn't add to sound length
   env = np.ones(total) #1’s equal to total count 
   # Linear attack
   env[:attack] = np.linspace(0.0, 1.0, attack)
 #Evenly spaced numbers from 0 to 1. Number of points equal to ‘attack’ 
   # Linear release
   env[sustain:total] = np.linspace(1.0, 0.0, release)
 #Evenly spaced numbers from 1 to 0. Number of points equal to ‘release’
   return env

Define a function to combine multiple notes from a single audio clip.

 def combine_notes(audio, start, end, velo, sr=16000):
 #’audio’ is an array of audio notes, ‘start’ is an array of note’s start  
 #time in seconds, ‘end’ is an array of note’s end times in seconds, ‘sr’ is    
 #the sample rate (integer)
   numberOfNotes = len(audio) #Number of notes
   clipLen = end.max() + 3.0 #compute length of audio clip
   clip = np.zeros(int(clipLength) * sr) #generate audio clip
   for t_start, t_end, velocity, i in zip(start, end, velo, 
   range(numberOfNotes)):
     # Generate an amplitude envelope
     noteLen = t_end - t_start #compute note length
    #call createEnvelope() defined above
     env = createEnvelope(noteLen) 
     len = len(env) #length of generated envelope
     audio_note = audio[i, :len] * env
     # Normalize the notes
     audio_note /= audio_note.max()
     audio_note *= (velocity / 127.0)
     clipStart = int(t_start * sr) #start time of audio clip
     clipEnd = clipStart + length #end time of clip
     #Add the audio note to clip buffer
     clip[clipStart:clipEnd] += audio_note 
    #Normalize the audio clip
     clip /= audio_clip.max()
     clip /= 2.0
   return clip #Array of combined audio samples

Define a function to plot spectrogram

 def spectrogram(audioClip):
   min = np.min(36) #minimum number of MIDI notes
   max = np.max(84) #maximum number of MIDI notes
 #Get the frequency of MIDI notes in Hertz(Hz)
   minF = librosa.midi_to_hz(min) #minimum frequency
   maxF = 2 * librosa.midi_to_hz(max) #maximum frequency
   #number of octaves
   octaves = int(np.ceil(np.log2(maxF) - np.log2(minF)))    
   binsPerOctave = 36 #number of bins in each octave
   nBins = int(binsPerOctave * octaves) #number of bins 
 #Calculate constant-Q transform of the audio signal
   C = librosa.cqt(audioClip, sr=SR, hop_length=2048, fmin=minF,     
   n_bins=nBins, bins_per_octave=binsPerOctave)
    #’audioClip’ is the audio time series
    # ‘sr’ is the sampling rate of audioClip
# ‘hop_length’ is the number of samples between successive CQT #columns       
    #‘fmin’ is the minimum frequency
    # ‘n_bins’ is the number of frequency bins
 #Compute power of the audio signal  
 power = 10 * np.log10(np.abs(C)**2 + 1e-6)
 #Display the ‘power’ array as a matrix in a new column window using 
 #matshow()of matplotlib
   plt.matshow(power[::-1, 2:-2], aspect='auto', cmap=plt.cm.magma)
   plt.yticks([])
   plt.xticks([])

Choose the MIDI file

midi_file = "Arpeggio (Default)" #@param ["Arpeggio (Default)", "Upload your own"]

This will allow you to choose the default uploaded MIDI file or upload a file of your choice as follows:

 #Path of the default uploaded file
 midi_path = RIFF
 #If user chooses ‘Upload your own’ option
 if midi_file == "Upload your own":
   try:
     fileList = upload() #Upload your file
     midi_path = fileList[0] #Path of recently uploaded file
     #Load the uploaded file
     noteSeqence, notes = load_midi(midi_path)
   except Exception as e: #Throw an exception if uploading fails
     print('Upload Cancelled')
 else:
   # Load the default uploaded file, but slow it down 30%
   noteSequence, notes = load_midi(midi_path)
   notes['startTimes'] *= 1.3
   notes['endTimes'] *= 1.3
       #Plot the notesequence 
 note_seq.plot_sequence(noteSequence)

Output:

Choose some random instruments to generate custom interpolation.

Audio ‘interpolation’ means making the audio sound better.

 #Select number of instruments
 number_of_random_instruments = 10 #@param {type:"slider", min:4, max:16, step:1}

A slider will appear as follows which will allow you to choose number of instruments from 4 to 16, in step of 1

 pitchPreview = 60
 num = number_of_random_instruments
 pitches = [pitchPreview] * num #Compute pitch
 #Generate latent vactor 
 latent_vector = model.generate_z(num)
 #Generate fake samples for latents and pitches of all the instruments
 audio_notes = model.generate_samples_from_z(latent_vector, pitches)
 for i, audio_note in enumerate(audio_notes):
 #Print the instrument number
   print("Instrument: {}".format(i))
 #Insert the HTML audio widget for each instrument’s audio file; pass the array of float sound i.e. audio_note and specify sample rate as parameters
   play(audio_note, sample_rate=16000)

Audio files of the instruments:

Instrument0 Instrument1 Instrument2 Instrument3 Instrument4 Instrument5 Instrument6 Instrument7 Instrument8 Instrument9

Sample output showing widget for each instrument’s sound:

(You can play the audio, adjust its volume and download it using the widgets)

13) Create a list of instruments to interpolate between

instruments = [0, 2, 4, 0]

Place each instrument at a specific point of time (from 0 to 1.0)

times = [0, 0.3, 0.6, 1.0]

Start and end times of synthesized audio

 times[0] = -0.001
 times[-1] = 1.0

14) Latent vectors of selected instruments

z_instruments = np.array([latent_vector[i] for i in instruments])

End times for selected instruments

 t_instruments = np.array([notes['endTimes'][-1] * t for t in 
 times])

Get interpolated latent vectors for each note

z_notes = gu.get_z_notes(notes['startTimes'], z_instruments, t_instruments)

15) Generate audio for each note

 print('Generating {} samples...'.format(len(z_notes)))
 audio_notes = model.generate_samples_from_z(z_notes, notes['pitches'])

16) Combine the audio samples of all instruments into a single audio clip

 ac = combine_notes(audio_notes,
                    notes['startTimes'],
                    notes['endTimes'],
                    notes['velocities'])

17) Play the synthesized audio

 print('\nAudio:')
 #Create audio widget; pass the clip and specify the sample rate
 play(ac, sample_rate=SR)

18) Plot the spectrogram using spectrogram() function defined in step (10)

 print('CQT Spectrogram:')
 spectrogram(ac)

Synthesized audio output
Google colab notebook of the above implementation can be found here.

References

For more information about GANSynth, refer to the following web links:

Access all our open Survey & Awards Nomination forms in one place

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

10 Deepfake AI Tools to Help You Create Content within Minutes

Gopika Raj

Deepfake is a double edged sword that can ignite creativity for social media engagement and can also cause immense harm

Commvault’s Arlie Teams Up with Microsoft to Elevate Cyber Resilience Globally

Shyam Nandan Upadhyay

Ready or Not, AI Agents Are Coming

Sukriti Gupta

Top Editorial Picks

African Tech Companies Prefer Zoho Enterprise over Google Workspace

Vandana Nair

Reid Hoffman Creates a DeepFake of Himself, Reid AI

Gopika Raj

GitHub Copilot Rival, Augment Secures $252 Mn at $1 Bn Valuation to Boost AI for Developers

K L Krithika

Synology Launches Advanced Data Management & Security Solutions Against Ransomware in India

Pritam Bordoloi

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Also in News

Become a Certified Generative AI Engineer

Check our Industry Research Reports

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.

AIM Videos

Zerodha CTO Dr. Kailash Nadh Decodes AI Culture in Tech

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Developer’s Corner

In Case You Missed It

Which is the Most Frustrating Programming Language?

Mohit Pandey 18/03/2024

AI4Bharat Rolls Out IndicLLMSuite for Building LLMs in Indian Languages

Shritama Saha 15/03/2024

Google Introduces Synth^2 to Enhance the Training of Visual Language Models

K L Krithika 14/03/2024

Infosys Funds Llama 2 Project with 22 Indian Languages

Infosys Founder Funds Meta’s Llama 2 Project with 22 Indian Languages

Mohit Pandey 13/03/2024

Hands-on Guide To GANSynth: An Adversarial Neural Audio Synthesis Technique

Practical Implementation of GANSynth

References

Nikita Shiledarbaxi

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to stay informed

Top Editorial Picks

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Also in News

AI Courses & Careers

Become a Certified Generative AI Engineer

Industry Insights

Check our Industry Research Reports

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA.

AIM Videos

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

GenAI Corner

Data Dialogues

Future Talks

Developer’s Corner

In Case You Missed It

Webstories

Also in Trends

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

Subscribe to Our Newsletter

Download the easiest way to
stay informed

Industry
Insights

GenAI
Corner

Data
Dialogues

Future
Talks