Recently, researchers from the Western Kentucky University proposed a multi-modal deep learning framework that has the capability to classify genres of video games based on the cover and textual description. The researchers claimed that this research is the first-ever attempt on automatic genre classification using a deep learning approach.
Videos games have been one of the most widespread, profitable, and prominent forms of entertainment around the globe. Also, genre and its classification systems play a significant role in the development of video games.
According to the researchers, video game covers and textual descriptions are usually the very first impression to its consumers, and they often convey important information about the video games.
However, researchers often find it difficult to classify video game genres based on its cover and textual description because of various reasons like a massive number of video game genres, many of which are not concretely defined; cover designs; and textual descriptions that may vary due to many external factors such as country, culture, and other such.
Also, with the growing competitiveness in the video game industry, the cover designers and typographers push the cover designs to its limit in the hope of attracting sales. In order to mitigate such problems, the researchers built this new deep learning framework.
The Tech Behind
For this, the researchers aimed to develop three deep learning algorithm for the task of video game genre classification, which are-
- An image-based approach using the game covers
- A text-based approach using the textual descriptions
- A multimodal approach using both the game covers and textual description
They evaluated five image-based models and two text-based models using deep transfer learning methods for the task of video game genre classification. The image-based models include MobileNet-V1, MobileNet-V2, Inception-V1, Inception-V2 and ResNet-50. Also, the two text-based models include recurrent neural networks (RNN) with Long Short-Term Memory (LSTM).
In addition to the cover images of the video games, they also used the game descriptions for genre classification. The main purpose of the game description is to express the objects involved in a game and the set of rules that induce transitions, resulting in a state-action space.
On the final step, the researchers considered a multimodal deep learning architecture based on both the game cover and description for the task of genre classification. This approach again involves two steps, which are:
- A neural network is trained on the classification task for each modality
- Intermediate representations are extracted from each network and combined in a multimodal learning step.
According to the researchers, the information from both modalities are then combined using the concatenation method, and hence it helped in increasing the classifying accuracy rate.
To perform this research, the researchers created a large dataset of 50,000 video games that includes game cover images, description text, title text, and genre information from IGDB.com, a video game database.
There were in total 21 genres found in the original dataset; however, they churned out 15 different genres, such as adventure, arcade, fighting, strategy, among others. The collected dataset can be used for a number of studies such as text recognition from images, automatic topic mining, and other such.
Contributions Of This Research
The researchers contributed to this research in a four-fold way:
- Firstly, they compiled a large dataset consisting of 50,000 video games from 21 genres made of cover images, description text, and title text and the genre information.
- Secondly, image-based and text-based, state-of-the-art models are evaluated thoroughly for the task of genre classification for video games.
- Thirdly, they developed an efficient multi-modal framework based on both images and texts.
- Lastly, a thorough analysis of the experimental results is shown by the researchers as well as future work to improve the performance is also suggested.
Read the paper here.