Listen to this story
Gaming has long been a hotbed of AI innovation, establishing precedents in fields like procedural content generation, pathfinding, decision-making algorithms, and human behaviour simulation. Now, it can add music generation to its arsenal. One of the world’s biggest developers, Activision Blizzard—known for games such as Call of Duty, Overwatch, and World of Warcraft—have recently published a patent that details a system that generates music unique to each player using artificial intelligence and machine learning algorithms.
This system has the potential to not only change the way that composers approach music in video games, but also redefines immersion in video games by creating a unique auditory experience for every player.
Before we delve into the specifics of how the proposed AI system works, let us explore some examples of certain algorithmic methods used to create unique experiences in games.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Inside Activision Blizzard’s AI music play
In the patent—filed under the US Patents Office in April of this year—team Activision Blizzard describes a system to dynamically generate and modulate music based on gaming events. As with any AI-based solution, the first step is to collect data on the player’s play patterns, profile, and performance, along with the music that was playing whenever the player engaged with the game, thus creating an activity–music pair.
This data is then sent to a central server, which then places the player in one or more player profiles depending on the nature of the data collected. These player profiles comprise beginner, enthusiast, and expert levels of skill.
The data is then used as a base to generate event data, which will depict the player’s engagement with ‘virtual elements’. This comprises a wide gamut of possible data points, including the player’s pace, their capacity to defeat enemies, and their general approach to interacting with the game. This event data is then used to classify the data into two or more event profiles. The event data is classified by the value of players’ engagement, ranging from low to high.
A machine learning algorithm is then trained on these datasets, with the weights saved for later iterations of the model. Then, the algorithm is used to generate music by identifying the player’s mood based on the event profiles and player profiles—in theory creating a reactive music system that responds to both the user’s activity and their interaction with virtual elements in the game. An additional model then changes elements of the song such as beat, metre, tempo, chord progressions, loudness, duration, and more depending on the player profiles.
This can result in a completely immersive experience for the user. Not only will the music generated be fitting for the events happening on the screen, which is the status quo today, the player’s approach to playing the game will also be taken into consideration while creating the music.
This will create an impactful experience for each player while still being discrete from other players’ experiences. Even different players in the same server—in case of a multiplayer game—will have fitting music accompanying their in-game escapades.
A brief history of music algorithms
One of the first systems used for interactive music is iMUSE, created by video game developer LucasArts in the early 90s. The engine was born out of the frustration of the existing audio system in the SCUMM game development engine, and was first used in the game Monkey Island 2: LeChuck’s Revenge.
The engine was relatively simple by modern standards, but set the precedent for how games approach music. The iMUSE system synchronised music to the player’s actions and would transition smoothly from one piece of music to another—a standout in an era when games commonly came bundled with fairly basic music which played on a loop. This concept then became a cornerstone of video game music, which then evolved to include concepts such as horizontal resequencing and vertical reorchestration.
Resequencing is when pre-composed sections of music play according to the player’s action cues, such as their location or the activity they are undertaking. Whenever the scenario changes, the music engine either crossfades to the next segment—befitting of the new scenario—or changes the segment upon completion of the current musical phrase. For instance, the game can change the music being played if the player picks up a power-up and then reverts to the original soundtrack when the power-up wears off.
Reorchestration, on the other hand, changes the mix of different instruments on top of a pre-existing ongoing loop of music. The player’s actions then determine the instrumentation of the soundtrack. For example, a game can have a certain track playing when the player is exploring, but amp up some components—like string instrumentations or percussion—when the player enters into combat.
Many algorithms were written on the basis of these two concepts, creating varied solutions befitting the scope of the game and how players played them. One notable example of reorchestration is 2016’s hit game DOOM, which used a complex algorithm to make musical accompaniments to the player’s actions.
To begin with, song sections were made into smaller parts, so as to let the algorithm pick the best part of songs to reflect the action on screen. Then, the player’s actions are closely monitored, with the music engine keeping a track of the player’s movement speed, number of enemies in combat, and the current activity being undertaken by the player i.e., exploration or combat.
Speaking on the Doom soundtrack, Hugo Martin, Creative Director, iD Software said, “We ultimately struck a really good balance with hearing the things you need to hear from a gameplay perspective. . . but then ultimately making the player constantly feel like a badass. It crescendos beautifully, it accents every action that I do, it’s a rock concert.”
Then, considering these factors, the algorithm picks the most fitting snippet of the song and blends it in seamlessly with the track that is already playing. The result is a soundtrack that complements the player’s actions perfectly and provides an unforgettable experience.
Even as such advanced tech has already been used for modern games, Blizzard aims to push the envelope further and completely personalise the game experience for each player.
AI-generated music might just be the tip of the iceberg when it comes to content in games. We are already seeing procedural generation take the mainstream in games, not only to create the music but also to create the world that the player is in. AI has also been leveraged to reduce system resource utilisation, as seen with NVIDIA’s DLSS technology. Using cutting-edge models will not only provide a value-add for games, but will create a bevy of new experiences that will leave an indelible mark on the new generation of gamers.