Now Reading
Exploring ArtLine – To Create Line Art Portraits, Movie Posters & Cartoonize Images in Python

Exploring ArtLine – To Create Line Art Portraits, Movie Posters & Cartoonize Images in Python

Deep Learning and Computer Vision has evolved and done wonders time and again. Today we are going to talk about one such recently done amazing project called ‘ArtLine’ that uses deep learning algorithms to achieve fine quality line art portraits. Besides that, it also can be used to generate movie posters and cartoonize images. It is currently the most trending topic in both GitHub and paperswithcode. It is created by Vijish Madhavan, a deep learning researcher. 

The model has been built using the APDrawing dataset and Anime line art pair using many different algorithms, derived from some research papers self-attention, progressive resizing and generator loss. It shows how stacking all the methods can generate high-quality results. Primarily PyTorch and Fastai libraries are used. It generates fine lines/edges in the sketch image, which is better than most existing methods. Try out the demo from this Colab Notebook with any portrait picture which is expected in an URL and then converted to image formats. You can clone the repository or tweak the code to use your local image file and within less than 2 minutes (executing with GPU) have a look at the amazing results. 

Deep Learning DevCon 2021 | 23-24th Sep | Register>>

Get a cartoon version of Tom Hanks

Here’s a movie poster generated by ArtLine

As of now, the movie poster and cartoon generating models have not been released. We can soon expect them in the near future. Only the pre-trained line art portrait generating model is available. 

Looking for a job change? Let us help you.

Let’s explore how the model training takes place.

Importing necessary libraries

 import torch
 import torch.nn as nn
 import fastai
 from fastai.vision import *
 from fastai.callbacks import *
 from fastai.vision.gan import *
 from torchvision.models import vgg16_bn
 from fastai.utils.mem import *
 from PIL import Image
 import numpy as np
 from torch.autograd import Variable
 import torchvision.transforms as transforms 
Edge Detection – this function uses the convolutional neural network to detect edges from an image as to features and use it as a gradient.
 def _gradient_img(img):
     img = img.squeeze(0)
     ten=torch.unbind(img)
     x=ten[0].unsqueeze(0).unsqueeze(0)
     a=np.array([[1, 0, -1],[2,0,-2],[1,0,-1]])
     conv1=nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1, bias=False) 
Building the neural network and assigning weights 
 conv1.weight=nn.Parameter(torch.from_numpy(a).float().unsqueeze(0).unsqueeze(0))
     G_x=conv1(Variable(x)).data.view(1,x.shape[2],x.shape[3])
     b=np.array([[1, 2, 1],[0,0,0],[-1,-2,-1]])
     conv2=nn.Conv2d(1, 1, kernel_size=4, stride=2, padding=2, bias=False)  conv2.weight=nn.Parameter(torch.from_numpy(b).float().unsqueeze(0).unsqueeze(0))
     G_y=conv2(Variable(x)).data.view(1,x.shape[2],x.shape[3])
     G=torch.sqrt(torch.pow(G_x,2)+ torch.pow(G_y,2))
     return G
 gradient = TfmPixel(_gradient_img) 

PATH – redirecting to the saved APDrawing dataset and a selective picture from the Anime sketch colourization pair.

 path = Path('/content/gdrive/My Drive/Apdrawing')
 Blended Facial Features
 path_hr = Path('/content/gdrive/My Drive/Apdrawing/draw tiny')
 path_lr = Path('/content/gdrive/My Drive/Apdrawing/Tiny Real')
 Portrait Pair
 path_hr3 = Path('/content/gdrive/My Drive/Apdrawing/drawing')
 path_lr3= Path('/content/gdrive/My Drive/Apdrawing/Real')
 Architecture -  pretrained resnet34 model is used
 arch = models.resnet34 
Detecting Facial Features
 src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.3, seed=42)
 def get_data(bs,size):
     data = (src.label_from_func(lambda x: path_hr/x.name)
            .transform(get_transforms(xtra_tfms=[gradient()]), size=size, tfm_y=True)
            .databunch(bs=bs,num_workers = 0).normalize(imagenet_stats, do_y=True))
     data.c = 3
     return data 

Progressive resizing by the Fastai library helps gradually increase the size of the image and the adjusting learning rates, thereby generalizing the images as it goes through different stages. 

64px

 bs,size=20, 64
 data = get_data(bs,size)
 data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))
 t = data.valid_ds[0][1].data
 t = torch.stack([t,t])
 def gram_matrix(x):
     n,c,h,w = x.size()
     x = x.view(n, c, -1)
     return (x @ x.transpose(1,2))/(c*h*w)
 gram_matrix(t)
 base_loss = F.l1_loss
 vgg_m = vgg16_bn(True).features.cuda().eval()
 requires_grad(vgg_m, False)
 blocks = [j-1 for j,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)]
 blocks, [vgg_m[i] for i in blocks] 

Perpetual loss is calculated for image transformations based on the VGG_16 model. It speeds up training. This approach combines both a per-pixel loss between the output and ground-truth images and optimizing perceptual loss functions based on high-level features extracted from pre-trained networks. The results are then used to train a feed-forward network.

 class FeatureLoss(nn.Module):
     def __init__(self, m_feat, layer_ids, layer_wgts):
         super().__init__()
         self.m_feat = m_feat
         self.losses = [self.m_feat[i] for i in layer_ids]
         self.hooks = hook_outputs(self.losses, detach=False)
         self.wgts = layer_wgts
         self.metrics_name = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))] + [f'gram_{i}' for i in range(len(layer_ids))]
     def make_features(self, x, clone=False):
         self.m_feat(x)
         return [(p.clone() if clone else p) for p in self.hooks.stored]
     def forward(self, input, target):
         out_feat = self.make_features(target, clone=True)
         in_feat = self.make_features(input)
         self.feat_losses = [base_loss(input,target)]
         self.feat_losses += [base_loss(f_in, f_out)*w
         for in, out, w in zip(in_feat, out_feat, self.wgts)]
         self.feat_losses += [base_loss(gram_matrix(in),  gram_matrix(out))*w**2 * 5e3
         for in, out, w in zip(in_feat, out_feat, self.wgts)]
         self.metrics = dict(zip(self.metric_names, self.feat_losses))
         return sum(self.feat_losses)
     def __del__(self): self.hooks.remove()
 feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])
 wd = 1e-3
 y_range = (-3.,3.) 

This function uses the self-attention model generator with U-Net and spatial normalization. This is a No GAN training which stabilizes colour images. Here minimal time is spent in direct GAN training instead, separately pretraining the generator and critic. This was introduced in another project named DeOldify. It helps largely in getting accurate facial features.

 def create_gen_learner():
     return unet_learner(data, arch, wd=wd, blur=True, norm_type=NormType.Spectral,self_attention=True, y_range=(-3.0, 3.0),loss_func=feat_loss, callback_fns=LossMetrics)
 gc.collect();
 learn_gen = create_gen_learner()
 learn_gen.lr_find()
 lr = 1-01
 epoch = 5 
fitting the model
 def do_fit(save_name, lrs=slice(lr), pct_start=0.9):
     learn_gen.fit_one_cycle(epoch, lrs, pct_start=pct_start,)
     learn_gen.save(save_name)
     learn_gen.show_results(rows=1, imgsize=5)
 do_fit('da', slice(lr))
 #lr*10
 learn_gen.unfreeze()
 learn_gen.lr_find()
 epoch = 5
 do_fit('db', slice(1E-2)) 

Results for different pixel values

128px

 data = get_data(8,128)
 learn_gen.data = data
 learn_gen.freeze()
 gc.collect()
 learn_gen.load('db');
 epoch =5
 lr = 1E-03
 do_fit('db2',slice(lr))
 learn_gen.unfreeze()
 epoch = 5
 do_fit('db3', slice(1e-02,1e-5), pct_start=0.3) 

192px

 data = get_data(5,192)
 learn_gen.data = data
 learn_gen.freeze()
 gc.collect()
 learn_gen.load('db3');
 epoch =5
 lr = 1E-06
 do_fit('db4')
 learn_gen.unfreeze()
 epoch = 5
 do_fit('db5', slice(1e-06,1e-4), pct_start=0.3) 
Acquiring data for portrait images
 src = ImageImageList.from_folder(path_lr3).split_by_rand_pct(0.2, seed=42)
 def get_data(bs,size):
     data = (src.label_from_func(lambda x: path_hr3/x.name)
            .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True).databunch(bs=bs,num_workers = 0).normalize(imagenet_stats, do_y=True))
     data.c = 3
     return data 

128px

 data = get_data(8,128)
 learn_gen.data = data
 learn_gen.freeze()
 gc.collect()
 learn_gen.load('db5');
 data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9))
 learn_gen.lr_find()
 epoch = 5
 lr = 1e-03
 do_fit('db6')
 learn_gen.unfreeze()
 epoch = 5
 do_fit('db7', slice(6.31E-07,1e-5), pct_start=0.3) 

192px

 data = get_data(4,192)
 learn_gen.data = data
 learn_gen.freeze()
 gc.collect()
 learn_gen.load('db7');
 learn_gen.lr_find()
 epoch = 5
 lr = 4.37E-05
 do_fit('db8')
 learn_gen.unfreeze()
 epoch = 5
 do_fit('db9', slice(1.00E-05,1e-3), pct_start=0.3) 

Endnotes

Limitations – Needs smooth or plain backgrounds to process and works poorly with lighting or shadows. Works poorly on low-quality images even.

Nevertheless, ArtLine is achieving pretty good state-of-the-art results, and the project is constantly under development. 

What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top