Now Reading
Generate Your ML Code In Few Clicks Using Train Generator

Generate Your ML Code In Few Clicks Using Train Generator

TrainGenerator is a Streamlit based web app for machine learning template code generation surpassing the different stages of data loading, preprocessing, model development, hyperparameter setting, and declaring other such constraints for complete model building. This wonderful open-source software has been created by Johannes Rieke, a machine learning engineer. This eases the task of data scientists and also non-technical people in the field of data science and machine learning. The code can then be used in Google Colab notebook or downloaded in .py or .ipynb formats. 

Traingenerator allows users to add their custom templates also. Until now, only image classification algorithms have been released. Soon object detection and other use cases will be seen. The left sidebar of the web app contains parameter specifications. Under framework selection, it has options for PyTorch and scikit-learn libraries. For model selection for PyTorch there is  Alexnet, Resnet, VGGnet, and DenseNet along with options for selecting pre-trained model built on ImageNet and for scikit-learn there is Support vectors, Random forest, Perceptron, K-nearest neighbours, Decision trees. Input data format specification there is Numpy files or Image files. Under preprocessing options include image resizing compatible with the model, centre crop image augmentation, scaling mean and standard deviation for the pre-trained model. Then comes the training options, including GPU availability and save a model checkpoint. Hyperparameters include loss functions (CrossEntropyLoss or BCEWithLogitsLoss), optimizers (Adam, Adadelta, Adagrad, Adamax, RMSprop, SGD), other parameters that can be can be specified manually are learning rate, batch size, epochs, printing progress after every batch. Lastly, there is an option for selecting visualisation (log metrics) in the form of TensorBoard, comet.ml or none.

Code Snippet

There are two ways to use – web app(as mentioned above) and running locally

git clone https://github.com/jrieke/traingenerator.git

cd traingenerator

pip install -r requirements.txt

streamlit run app/main.py

Code Generated for sklearn 

 import numpy as np
 import sklearn
 from sklearn.tree import DecisionTreeClassifier
 from torchvision import datasets, transforms
 import urllib
 import zipfile
 from tensorboardX import SummaryWriter
 from datetime import datetime 

# comment out this part to use own data

 url = "https://github.com/jrieke/traingenerator/raw/main/data/fake-image-data.zip"
 zip_path, _ = urllib.request.urlretrieve(url)
 with zipfile.ZipFile(zip_path, "r") as f:
     f.extractall("data") 

# Data insertion

 train_data = "data/image-data"  # required
 val_data = "data/image-data"    # optional
 test_data = None                # optional 

# Setting up logging.

 experiment_id = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
 writer = SummaryWriter(logdir=f"logs/{experiment_id}") 

# preprocessing

# Setting up a scalar.

 scaler = sklearn.preprocessing.StandardScaler()
 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None 

    # Reading image files to pytorch dataset 

     transform = transforms.Compose([
         transforms.Resize(28), 
         transforms.CenterCrop(28), 
         transforms.ToTensor()
     ])
     data = datasets.ImageFolder(data, transform=transform) 

    # Converting images to NumPy arrays.

     images_shape = (len(data), *data[0][0].shape)
     images = np.zeros(images_shape)
     labels = np.zeros(len(data))
     for i, (image, label) in enumerate(data):
         images[i] = image
         labels[i] = label
     images = images.reshape(len(images), -1) 

   # Scaling to mean 0 and std 1.

     if name == "train":
         scaler.fit(images)
     images = scaler.transform(images) 

    # Shuffling over the train set.

     if name == "train":
         images, labels = sklearn.utils.shuffle(images, labels)
     return [images, labels] 
 processed_train_data = preprocess(train_data, "train")
 processed_val_data = preprocess(val_data, "val")
 processed_test_data = preprocess(test_data, "test")
 model = DecisionTreeClassifier()
 def evaluate(data, name):
     if data is None:  # val/test can be empty
         return
     images, labels = data
     acc = model.score(images, labels)
     print(f"{name + ':':6} accuracy: {acc}")
     writer.add_scalar(f"{name}_accuracy", acc) 

# Train on train_data.

model.fit(*processed_train_data)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, presort='deprecated', random_state=None, splitter='best')

# Evaluation

 evaluate(processed_train_data, "train")
 evaluate(processed_val_data, "val")
 evaluate(processed_test_data, "test") 
 train: accuracy: 1.0
 val:   accuracy: 1.0 

Complete notebook from traingenerator can be viewed from here.

Code generated for Pytorch

 import numpy as np
 import torch
 from torch import optim, nn
 from torch.utils.data import DataLoader, TensorDataset
 from torchvision import models, datasets, transforms
 from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
 from ignite.metrics import Accuracy, Loss
 from datetime import datetime
 from tensorboardX import SummaryWriter
 from pathlib import Path 

** loading same as sklearn

# preprocessing

 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None 

    # Reading image files to pytorch dataset.

     transform = transforms.Compose([
         transforms.Resize(256), 
         transforms.CenterCrop(224), 
         transforms.ToTensor(), 
     ])
     dataset = datasets.ImageFolder(data, transform=transform)
     loader = DataLoader(dataset, batch_size=batch_size, shuffle=(name=="train"), **kwargs)
     return loader
 train_loader = preprocess(train_data, "train")
 val_loader = preprocess(val_data, "val")
 test_loader = preprocess(test_data, "test") 

# Setting up model, loss, optimizer.

 model = models.resnet18(pretrained=True)
 model = model.to(device)
 loss_func = nn.CrossEntropyLoss()
 optimizer = optim.Adam(model.parameters(), lr=lr) 

# Setting up pytorch-ignite trainer and evaluator.

See Also
Top 10 Programmers To Follow On Youtube

 trainer = create_supervised_trainer(
     model,
     optimizer,
     loss_func,
     device=device,
 )
 metrics = {
     "accuracy": Accuracy(),
     "loss": Loss(loss_func),
 }
 evaluator = create_supervised_evaluator(
     model, metrics=metrics, device=device
 )
 @trainer.on(Events.ITERATION_COMPLETED(every=print_every))
 def log_batch(trainer):
     batch = (trainer.state.iteration - 1) % trainer.state.epoch_length + 1
     print(
         f"Epoch {trainer.state.epoch} / {num_epochs}, "
         f"batch {batch} / {trainer.state.epoch_length}: "
         f"loss: {trainer.state.output:.3f}"
     )
 @trainer.on(Events.EPOCH_COMPLETED)
 def log_epoch(trainer):
     print(f"Epoch {trainer.state.epoch} / {num_epochs} average results: ")
     def log_results(name, metrics, epoch):
         print(
             f"{name + ':':6} loss: {metrics['loss']:.3f}, "
             f"accuracy: {metrics['accuracy']:.3f}"
         )
         writer.add_scalar(f"{name}_loss", metrics["loss"], epoch)
         writer.add_scalar(f"{name}_accuracy", metrics["accuracy"], epoch) 

    # Training data.

     evaluator.run(train_loader)
     log_results("train", evaluator.state.metrics, trainer.state.epoch) 

    # Validation data.

     if val_loader:
         evaluator.run(val_loader)
         log_results("val", evaluator.state.metrics, trainer.state.epoch) 

    # Testing data.

     if test_loader:
         evaluator.run(test_loader)
         log_results("test", evaluator.state.metrics, trainer.state.epoch)
     print()
     print("-" * 80)
     print()
 @trainer.on(Events.EPOCH_COMPLETED) 

# saving checkpoint

 def checkpoint_model(trainer):
     torch.save(model, checkpoint_dir / f"model-epoch{trainer.state.epoch}.pt") 

# Starting training.

trainer.run(train_loader, max_epochs=num_epochs)

 Epoch 1 / 5, batch 1 / 1: loss: 8.112
 Epoch 1 / 5 average results: 
 train: loss: 10.275, accuracy: 0.000
 val:   loss: 11.407, accuracy: 0.000
 Epoch 2 / 5, batch 1 / 1: loss: 0.152
 Epoch 2 / 5 average results: 
 train: loss: 7.251, accuracy: 0.000
 val:   loss: 10.479, accuracy: 0.000
 Epoch 3 / 5, batch 1 / 1: loss: 0.185
 Epoch 3 / 5 average results: 
 train: loss: 4.322, accuracy: 0.500
 val:   loss: 10.263, accuracy: 0.000
 Epoch 4 / 5, batch 1 / 1: loss: 0.000
 Epoch 4 / 5 average results: 
 train: loss: 2.429, accuracy: 0.500
 val:   loss: 9.824, accuracy: 0.000
 Epoch 5 / 5, batch 1 / 1: loss: 0.000
 Epoch 5 / 5 average results: 
 train: loss: 1.521, accuracy: 0.750
 val:   loss: 9.791, accuracy: 0.000 

Complete notebook from traingenerator can be viewed from here.

Deployment using Heroku

After complete installation and logging onto Heroku, inside traingenerator run:

heroku create

git push heroku main

heroku open

EndNotes

To make contributions in the form of adding more templates make pull requests to the Github repository. Traingenerator is a simple, easy-to-use and user-friendly app for both technical and non-technical people. It’s auto code generation features come in very handy for large scale productions.  


Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
What's Your Reaction?
Excited
1
Happy
0
In Love
0
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top