Now Reading
What is Face Identity Disentanglement and How it outperformed GANs?

What is Face Identity Disentanglement and How it outperformed GANs?


Face Identity Disentanglement via Latent Space Mapping becomes the state-of-the-art in face image generation by greatly surpassing existing Generative Adversarial Networks such as StyleGAN. Generative Adversarial Networks, simply known as GANs, nowadays find a prominent place in deep learning with wide applications including high-resolution image synthesis, image-to-image translation, video-to-video translation, image inpainting, and video inpainting. StyleGAN and other competing methods are well known for their face image generation abilities. However, they need excessive supervision and training, and compromised quality which make generalization difficult. Face Identity Disentanglement via Latent Space Mapping is a method that learns how to represent image data in disentangled latent representations, with minimal supervision, manifested using available pre-trained generative networks such as StyleGAN. By learning to map into latent space, state-of-the-art quality as well as rich-expressive latent space are achieved. Disentangled latent representations allow generative models to control and compose the disentangled factors in the image generation process.

Disentanglement is a generative model’s ability to solely control a single feature without affecting the other features. For instance, in face generation, disentanglement helps either generate faces of the same identity but with different attributes such as pose, expression and illumination, or generate faces of the same pose but with different identities. Disentanglement is considered a non-trivial task in machine learning. The current framework demonstrates high-quality disentanglement of face identity from all other attributes, capable of generating high-resolution faces of different identity and/or attributes. This framework’s key idea is to map the disentangled latent representation to the latent space of a pre-trained generator such as StyleGAN. This Face Identity Disentanglement framework is developed by Yotam Nitzan, Amit Bermano, Daniel Cohen-Or of Tel-Aviv University and Yangyan Li of Alibaba Cloud Intelligence Business Group.

Disentanglement framework with Latent Space Mapping

This disentanglement framework uses two encoders to generate the latent representation 𝑧, consisting of a description of the property of interest, and all the rest. Here, the first encoder generates a latent representation of the identity of the face and the second encoder generates a latent representation of facial attributes such as pose, expression and illumination. The latent representation is then mapped to the latent space W of the pre-trained generator 𝐺. This decouples the tasks of learning quality image generation and disentanglement. Due to disentanglement, the two parts of latent representations are mutually exclusive and carry entirely different information. Therefore, this approach mapping is trained solely to successfully disentangle provided input information and extract useful representation that can be combined in the generator to synthesize high-quality target images. 

In this disentanglement framework, three inputs are used to generate 3 by 3 image-matrix by preserving face identity along with columns and preserving facial attribute along rows.

Human faces possess many independent, high-dimensional features, and high photometric, geometric and kinematic complexities. This disentanglement framework concentrates on image synthesis with disentangled control over face identity while preserving the other facial attributes. This type of control is highly useful in applications such as reenactment and de-identification. The output quality is directly determined by the pre-trained generator employed. Hence this framework incorporates the state-of-the-art StyleGAN as the pre-trained generator. 

The above illustration depicts the dataflow and losses of the framework. Two input images, one for face identity feature, 𝐼𝑖𝑑 and another for facial attributes, πΌπ‘Žπ‘‘π‘‘π‘Ÿ are fed to respective encoders. The latent representations are mapped to latent space, which is then fed into the generator. An adversarial loss Lπ‘Žπ‘‘π‘£ ensures proper mapping to the W space. Identity preservation is encouraged using L𝑖𝑑 , that penalizes differences in identity between 𝐼𝑖𝑑, πΌπ‘œπ‘’π‘‘ . Attributes preservation is encouraged using Lπ‘Ÿπ‘’π‘ , L𝑙𝑛𝑑 , that penalizes pixel-level and facial landmarks differences respectively, between πΌπ‘Žπ‘‘π‘‘π‘Ÿ , πΌπ‘œπ‘’π‘‘.

Python Implementation of Disentanglement Framework


The following command imports necessary source codes, files and datasets from the official Github repository. Make sure that CUDA GPU runtime is enabled on the local machine or Colab or Jupyter notebook.

!git clone


Confirm the proper file download using the command

!ls ID-disentanglement/



Activate the conda environment on the local machine. If Anaconda-3 is not installed on the machine or if the user uses Colab, the following command installs Anaconda-3 distribution.

For 64-bit machine,


For 32-bit machine,



Various pre-trained generators are available forr training and inference. Users can opt for any of the available generators from the corresponding Github repository. Here FFHQ_StyleGAN_256x256 model is used in Colab. Since models are stored in a shared directory in Google Drive, the necessary setup in Colab to download files from Google Drive must be enabled using the following commands and codes.

 !pip install -U -q PyDrive
 import os
 from pydrive.auth import GoogleAuth
 from import GoogleDrive
 from google.colab import auth
 from oauth2client.client import GoogleCredentials 

Users must authenticate access to Google Drive via Google Cloud Storage by generating the following codes’ verification code.

 gauth = GoogleAuth()
 gauth.credentials = GoogleCredentials.get_application_default()
 drive = GoogleDrive(gauth) 

Finally, pre-trained StyleGAN can be downloaded using the codes

 local_download_path = os.path.expanduser('~')
 except: pass
 file_list = drive.ListFile(
     {'q': "'1OgLvUhd9FX9_mPXrfqAWaLZsceQzE9l4' in parents"}).GetList()
 for f in file_list:
   # 3. Create & download by id.
   print('title: %s, id: %s' % (f['title'], f['id']))
   fname = os.path.join(local_download_path, f['title'])
   print('downloading to {}'.format(fname))
   f_ = drive.CreateFile({'id': f['id']})
 with open(fname, 'rb') as f:


The Face Identity Disentanglement framework is designed to use Tensorflow 2.X on python (3.7), using cuda 10.1 and cudnn 7.6.5. Following commands create a conda environment that has the needed dependencies.

!exec bash

Within the shell, content run the following command

conda create -n environment.yml


See Also
What Is A Time Series GAN?


Dataset for training and inference can locally be created using the commands

 cd ID-disentanglement/utils
 python \ 
     --resolution N \
     --batch_size BATCH_SIZE \
     --output_path OUTPUT_PATH \
     --pretrained_models_path PRETRAINED_MODELS_PATH \
     --num_images NUM_IMAGES \
     --gpu GPU 


The Face Identity Disentanglement framework can be trained using the following command

     --resolution N
     --pretrained_models_path PRETRAINED_MODELS_PATH
     --dataset BASE_DATASET_DIR
     --batch_size BATCH_SIZE
     --cross_frequency 3
     --train_data_size 70000
     --results_dir RESULTS_DIR        


Inference on the trained model with the downloaded test dataset can be performed using the following commands. 

     --pretrained_models_path PRETRAINED_MODELS_PATH \
     --load_checkpoint PATH_TO_WEIGHTS \
     --id_dir DIR_OF_IMAGES_FOR_ID \
     --attr_dir DIR_OF_IMAGES_FOR_ATTR \
     --output_dir DIR_FOR_OUTPUTS \
     --test_func infer_on_dirs 

to test performance on two sets of images, one for preserving face identity and another for preserving facial attributes.

     --pretrained_models_path PRETRAINED_MODELS_PATH \
     --load_checkpoint PATH_TO_WEIGHTS \
     --input_dir PARENT_DIR \
     --output_dir DIR_FOR_OUTPUTS \
     --test_func interpolate 

to test performance on three sets of images, one for preserving face identity and the other two sets for sequential interpolation of facial attributes.

Performance Evaluation of Disentanglement Framework

Qualitative and the quantitative performance of the disentanglement framework areevaluated using the Flickr-Faces-HQ images (FFHQ).

Input images from FFHQ image dataset. Face identity is preserved along with columns while other facial attributes are preserved along rows.
Both input and output images are generated using StyleGAN generator incorporating Face Identification Disentanglement Framework
Qualitative comparison of Disentanglement framework with existing state-of-the-arts FSGAN and FaceShifter. 

Disentanglement approach extraordinarily exceeds performances of state-of-the-arts in identity-preserved face generation such as FaceShifter, FSGAN, ALAE, and pSp..

Notable applications of the Disentanglement framework


Sequential interpolation of a given image between two different input images of different  attributes. The identity of a given image is maintained throughout the interpolation. Here the input image of the identity source is not shown.


Sequential interpolation of two different input images of different identities and attributes. Both identity and attributes are matched to the input images at both ends of interpolation.

Note: Images and illustrations other than the code outputs are obtained from the original research paper.

References and further reading:

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top