Researchers from Intel Lab have landscaped Grand Theft Auto V to make it look almost photorealistic. The team modified the graphics by training CNNs on real-world images and then applied the training data to rendered images from the game.
Let’s take a look at the research that made this possible.
A team of researchers, including Stephan R. Richter, Hassan Abu AlHaija, and Vladlen Koltun, from Intel Lab, has proposed the new approach to enhance the realism of synthetic images. In the research paper titled ‘Enhancing photorealism enhancement’, the team introduces a CNN model to enhance the images by leveraging intermediate representations produced by traditional rendering pipelines.
The network is trained through a novel adversarial objective, providing strong supervision at multiple perceptual levels. The researchers analysed scene layout distributions in datasets and proposed a new strategy for sampling image patches during training.
The team introduced multiple architectural improvements in the deep network modules used for photorealism enhancement.
“Our starting point is a set of intermediate buffers (G-buffers) produced by game engines during the rendering process. These buffers provide detailed information on geometry, materials, and lighting in the scene. We train convolutional networks with these auxiliary inputs to enhance the realism of images produced by the rendering pipeline,” said researchers. The approach resulted in glossy cars, smooth roads, and good greenery. The team took rendered images from GTA V games (left of the image) and made enhancements (right).
The method consists of an image enhancement network, which takes as input a rendered image and outputs an enhanced image. The G-buffers are capable of identifying various details, such as the scene’s geometry, materials, texture, lighting etc.
The G-buffer features are fed into the image enhancement network, where they are used to modulate image features. The HRNetV2 is used to construct the image enhancement network. The HRNet divides an image into several branches, each of which works at a different resolution. To maintain a fine image structure, one function stream is held at a reasonably high resolution (about 1/4 of the input resolution).
” Our approach significantly enhances the realism of rendered images. This is confirmed by a comprehensive evaluation of our method against strong baselines. Intuitively, our method achieves the strongest and most consistent results for objects and scenes that have clear correspondences in the real dataset; our method excels at road textures, cars, and vegetation,” as per the researchers.
The method generates high-quality improvements that are geometrically and semantically compatible with the original images while still matching the dataset’s design. Every minute details of the image are modified to provide an immersive gaming experience.
As per the Intel lab researchers, their method integrates learning-based approaches with conventional real-time rendering pipelines. “Since Gbuffers that are used as input are produced natively on the GPU, our method could be integrated more deeply into game engines, increasing efficiency and possibly further advancing the level of realism,” as per the researchers. The researchers reported substantial gains in stability and realism in comparison to recent image-to-image translation methods and a variety of other baselines.
From 2021 to 2026, the market for the gaming GPU is expected to grow at a CAGR of 14.1%, as per the Gaming GPU Market Report.