This AI Model Can Figure Out Video Games By Its Cover

Recently, researchers from the Western Kentucky University proposed a multi-modal deep learning framework that has the capability to classify genres of video games based on the cover and textual description. The researchers claimed that this research is the first-ever attempt on automatic genre classification using a deep learning approach.

Videos games have been one of the most widespread, profitable, and prominent forms of entertainment around the globe. Also, genre and its classification systems play a significant role in the development of video games. 

According to the researchers, video game covers and textual descriptions are usually the very first impression to its consumers, and they often convey important information about the video games. 

However, researchers often find it difficult to classify video game genres based on its cover and textual description because of various reasons like a massive number of video game genres, many of which are not concretely defined; cover designs; and textual descriptions that may vary due to many external factors such as country, culture, and other such.

Also, with the growing competitiveness in the video game industry, the cover designers and typographers push the cover designs to its limit in the hope of attracting sales. In order to mitigate such problems, the researchers built this new deep learning framework.

The Tech Behind

For this, the researchers aimed to develop three deep learning algorithm for the task of video game genre classification, which are-

  • An image-based approach using the game covers
  • A text-based approach using the textual descriptions
  • A multimodal approach using both the game covers and textual description

They evaluated five image-based models and two text-based models using deep transfer learning methods for the task of video game genre classification. The image-based models include MobileNet-V1, MobileNet-V2, Inception-V1, Inception-V2 and ResNet-50. Also, the two text-based models include recurrent neural networks (RNN) with Long Short-Term Memory (LSTM).  

In addition to the cover images of the video games, they also used the game descriptions for genre classification. The main purpose of the game description is to express the objects involved in a game and the set of rules that induce transitions, resulting in a state-action space.

On the final step, the researchers considered a multimodal deep learning architecture based on both the game cover and description for the task of genre classification. This approach again involves two steps, which are: 

  • A neural network is trained on the classification task for each modality
  • Intermediate representations are extracted from each network and combined in a multimodal learning step. 

According to the researchers, the information from both modalities are then combined using the concatenation method, and hence it helped in increasing the classifying accuracy rate.

Dataset Used

To perform this research, the researchers created a large dataset of 50,000 video games that includes game cover images, description text, title text, and genre information from, a video game database. 

There were in total 21 genres found in the original dataset; however, they churned out 15 different genres, such as adventure, arcade, fighting, strategy, among others. The collected dataset can be used for a number of studies such as text recognition from images, automatic topic mining, and other such.

Contributions Of This Research

The researchers contributed to this research in a four-fold way:

  • Firstly, they compiled a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information. 
  • Secondly, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of genre classification for video games.
  • Thirdly, they developed an efficient multi-modal framework based on both images and texts.
  • Lastly, a thorough analysis of the experimental results is shown by the researchers as well as future work to improve the performance is also suggested.

Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Yugesh Verma
A guide to explainable named entity recognition

Named entity recognition (NER) is difficult to understand how the process of NER worked in the background or how the process is behaving with the data, it needs more explainability. we can make it more explainable.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM