Deep Learning and Computer Vision has evolved and done wonders time and again. Today we are going to talk about one such recently done amazing project called ‘ArtLine’ that uses deep learning algorithms to achieve fine quality line art portraits. Besides that, it also can be used to generate movie posters and cartoonize images. It is currently the most trending topic in both GitHub and paperswithcode. It is created by Vijish Madhavan, a deep learning researcher.
The model has been built using the APDrawing dataset and Anime line art pair using many different algorithms, derived from some research papers self-attention, progressive resizing and generator loss. It shows how stacking all the methods can generate high-quality results. Primarily PyTorch and Fastai libraries are used. It generates fine lines/edges in the sketch image, which is better than most existing methods. Try out the demo from this Colab Notebook with any portrait picture which is expected in an URL and then converted to image formats. You can clone the repository or tweak the code to use your local image file and within less than 2 minutes (executing with GPU) have a look at the amazing results.
THE BELAMY
Sign up for your weekly dose of what's up in emerging technology.
Get a cartoon version of Tom Hanks
Here’s a movie poster generated by ArtLine
As of now, the movie poster and cartoon generating models have not been released. We can soon expect them in the near future. Only the pre-trained line art portrait generating model is available.
Let’s explore how the model training takes place.
Importing necessary libraries
import torch import torch.nn as nn import fastai from fastai.vision import * from fastai.callbacks import * from fastai.vision.gan import * from torchvision.models import vgg16_bn from fastai.utils.mem import * from PIL import Image import numpy as np from torch.autograd import Variable import torchvision.transforms as transforms
Edge Detection – this function uses the convolutional neural network to detect edges from an image as to features and use it as a gradient.
def _gradient_img(img): img = img.squeeze(0) ten=torch.unbind(img) x=ten[0].unsqueeze(0).unsqueeze(0) a=np.array([[1, 0, -1],[2,0,-2],[1,0,-1]]) conv1=nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1, bias=False)
Building the neural network and assigning weights
conv1.weight=nn.Parameter(torch.from_numpy(a).float().unsqueeze(0).unsqueeze(0)) G_x=conv1(Variable(x)).data.view(1,x.shape[2],x.shape[3]) b=np.array([[1, 2, 1],[0,0,0],[-1,-2,-1]]) conv2=nn.Conv2d(1, 1, kernel_size=4, stride=2, padding=2, bias=False) conv2.weight=nn.Parameter(torch.from_numpy(b).float().unsqueeze(0).unsqueeze(0)) G_y=conv2(Variable(x)).data.view(1,x.shape[2],x.shape[3]) G=torch.sqrt(torch.pow(G_x,2)+ torch.pow(G_y,2)) return G gradient = TfmPixel(_gradient_img)
PATH – redirecting to the saved APDrawing dataset and a selective picture from the Anime sketch colourization pair.
path = Path('/content/gdrive/My Drive/Apdrawing') Blended Facial Features path_hr = Path('/content/gdrive/My Drive/Apdrawing/draw tiny') path_lr = Path('/content/gdrive/My Drive/Apdrawing/Tiny Real') Portrait Pair path_hr3 = Path('/content/gdrive/My Drive/Apdrawing/drawing') path_lr3= Path('/content/gdrive/My Drive/Apdrawing/Real') Architecture - pretrained resnet34 model is used arch = models.resnet34
Detecting Facial Features
src = ImageImageList.from_folder(path_lr).split_by_rand_pct(0.3, seed=42) def get_data(bs,size): data = (src.label_from_func(lambda x: path_hr/x.name) .transform(get_transforms(xtra_tfms=[gradient()]), size=size, tfm_y=True) .databunch(bs=bs,num_workers = 0).normalize(imagenet_stats, do_y=True)) data.c = 3 return data
Progressive resizing by the Fastai library helps gradually increase the size of the image and the adjusting learning rates, thereby generalizing the images as it goes through different stages.
64px
bs,size=20, 64 data = get_data(bs,size) data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9)) t = data.valid_ds[0][1].data t = torch.stack([t,t]) def gram_matrix(x): n,c,h,w = x.size() x = x.view(n, c, -1) return (x @ x.transpose(1,2))/(c*h*w) gram_matrix(t) base_loss = F.l1_loss vgg_m = vgg16_bn(True).features.cuda().eval() requires_grad(vgg_m, False) blocks = [j-1 for j,o in enumerate(children(vgg_m)) if isinstance(o,nn.MaxPool2d)] blocks, [vgg_m[i] for i in blocks]
Perpetual loss is calculated for image transformations based on the VGG_16 model. It speeds up training. This approach combines both a per-pixel loss between the output and ground-truth images and optimizing perceptual loss functions based on high-level features extracted from pre-trained networks. The results are then used to train a feed-forward network.
class FeatureLoss(nn.Module): def __init__(self, m_feat, layer_ids, layer_wgts): super().__init__() self.m_feat = m_feat self.losses = [self.m_feat[i] for i in layer_ids] self.hooks = hook_outputs(self.losses, detach=False) self.wgts = layer_wgts self.metrics_name = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))] + [f'gram_{i}' for i in range(len(layer_ids))] def make_features(self, x, clone=False): self.m_feat(x) return [(p.clone() if clone else p) for p in self.hooks.stored] def forward(self, input, target): out_feat = self.make_features(target, clone=True) in_feat = self.make_features(input) self.feat_losses = [base_loss(input,target)] self.feat_losses += [base_loss(f_in, f_out)*w for in, out, w in zip(in_feat, out_feat, self.wgts)] self.feat_losses += [base_loss(gram_matrix(in), gram_matrix(out))*w**2 * 5e3 for in, out, w in zip(in_feat, out_feat, self.wgts)] self.metrics = dict(zip(self.metric_names, self.feat_losses)) return sum(self.feat_losses) def __del__(self): self.hooks.remove() feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2]) wd = 1e-3 y_range = (-3.,3.)
This function uses the self-attention model generator with U-Net and spatial normalization. This is a No GAN training which stabilizes colour images. Here minimal time is spent in direct GAN training instead, separately pretraining the generator and critic. This was introduced in another project named DeOldify. It helps largely in getting accurate facial features.
def create_gen_learner(): return unet_learner(data, arch, wd=wd, blur=True, norm_type=NormType.Spectral,self_attention=True, y_range=(-3.0, 3.0),loss_func=feat_loss, callback_fns=LossMetrics) gc.collect(); learn_gen = create_gen_learner() learn_gen.lr_find() lr = 1-01 epoch = 5
fitting the model
def do_fit(save_name, lrs=slice(lr), pct_start=0.9): learn_gen.fit_one_cycle(epoch, lrs, pct_start=pct_start,) learn_gen.save(save_name) learn_gen.show_results(rows=1, imgsize=5) do_fit('da', slice(lr)) #lr*10 learn_gen.unfreeze() learn_gen.lr_find() epoch = 5 do_fit('db', slice(1E-2))
Results for different pixel values
128px
data = get_data(8,128) learn_gen.data = data learn_gen.freeze() gc.collect() learn_gen.load('db'); epoch =5 lr = 1E-03 do_fit('db2',slice(lr)) learn_gen.unfreeze() epoch = 5 do_fit('db3', slice(1e-02,1e-5), pct_start=0.3)
192px
data = get_data(5,192) learn_gen.data = data learn_gen.freeze() gc.collect() learn_gen.load('db3'); epoch =5 lr = 1E-06 do_fit('db4') learn_gen.unfreeze() epoch = 5 do_fit('db5', slice(1e-06,1e-4), pct_start=0.3)
Acquiring data for portrait images
src = ImageImageList.from_folder(path_lr3).split_by_rand_pct(0.2, seed=42) def get_data(bs,size): data = (src.label_from_func(lambda x: path_hr3/x.name) .transform(get_transforms(max_zoom=2.), size=size, tfm_y=True).databunch(bs=bs,num_workers = 0).normalize(imagenet_stats, do_y=True)) data.c = 3 return data
128px
data = get_data(8,128) learn_gen.data = data learn_gen.freeze() gc.collect() learn_gen.load('db5'); data.show_batch(ds_type=DatasetType.Valid, rows=2, figsize=(9,9)) learn_gen.lr_find() epoch = 5 lr = 1e-03 do_fit('db6') learn_gen.unfreeze() epoch = 5 do_fit('db7', slice(6.31E-07,1e-5), pct_start=0.3)
192px
data = get_data(4,192) learn_gen.data = data learn_gen.freeze() gc.collect() learn_gen.load('db7'); learn_gen.lr_find() epoch = 5 lr = 4.37E-05 do_fit('db8') learn_gen.unfreeze() epoch = 5 do_fit('db9', slice(1.00E-05,1e-3), pct_start=0.3)
Endnotes
Limitations – Needs smooth or plain backgrounds to process and works poorly with lighting or shadows. Works poorly on low-quality images even.
Nevertheless, ArtLine is achieving pretty good state-of-the-art results, and the project is constantly under development.