MITB Banner

Generate Your ML Code In Few Clicks Using Train Generator

TrainGenerator is a Streamlit based web app for machine learning template code generation surpassing the different stages of data loading, preprocessing, model development, hyperparameter setting, and declaring other such constraints for complete model building.

TrainGenerator is a Streamlit based web app for machine learning template code generation surpassing the different stages of data loading, preprocessing, model development, hyperparameter setting, and declaring other such constraints for complete model building. This wonderful open-source software has been created by Johannes Rieke, a machine learning engineer. This eases the task of data scientists and also non-technical people in the field of data science and machine learning. The code can then be used in Google Colab notebook or downloaded in .py or .ipynb formats. 

Traingenerator allows users to add their custom templates also. Until now, only image classification algorithms have been released. Soon object detection and other use cases will be seen. The left sidebar of the web app contains parameter specifications. Under framework selection, it has options for PyTorch and scikit-learn libraries. For model selection for PyTorch there is  Alexnet, Resnet, VGGnet, and DenseNet along with options for selecting pre-trained model built on ImageNet and for scikit-learn there is Support vectors, Random forest, Perceptron, K-nearest neighbours, Decision trees. Input data format specification there is Numpy files or Image files. Under preprocessing options include image resizing compatible with the model, centre crop image augmentation, scaling mean and standard deviation for the pre-trained model. Then comes the training options, including GPU availability and save a model checkpoint. Hyperparameters include loss functions (CrossEntropyLoss or BCEWithLogitsLoss), optimizers (Adam, Adadelta, Adagrad, Adamax, RMSprop, SGD), other parameters that can be can be specified manually are learning rate, batch size, epochs, printing progress after every batch. Lastly, there is an option for selecting visualisation (log metrics) in the form of TensorBoard, comet.ml or none.

Code Snippet

There are two ways to use – web app(as mentioned above) and running locally

git clone https://github.com/jrieke/traingenerator.git

cd traingenerator

pip install -r requirements.txt

streamlit run app/main.py

Code Generated for sklearn 

 import numpy as np
 import sklearn
 from sklearn.tree import DecisionTreeClassifier
 from torchvision import datasets, transforms
 import urllib
 import zipfile
 from tensorboardX import SummaryWriter
 from datetime import datetime 

# comment out this part to use own data

 url = "https://github.com/jrieke/traingenerator/raw/main/data/fake-image-data.zip"
 zip_path, _ = urllib.request.urlretrieve(url)
 with zipfile.ZipFile(zip_path, "r") as f:
     f.extractall("data") 

# Data insertion

 train_data = "data/image-data"  # required
 val_data = "data/image-data"    # optional
 test_data = None                # optional 

# Setting up logging.

 experiment_id = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
 writer = SummaryWriter(logdir=f"logs/{experiment_id}") 

# preprocessing

# Setting up a scalar.

 scaler = sklearn.preprocessing.StandardScaler()
 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None 

    # Reading image files to pytorch dataset 

     transform = transforms.Compose([
         transforms.Resize(28), 
         transforms.CenterCrop(28), 
         transforms.ToTensor()
     ])
     data = datasets.ImageFolder(data, transform=transform) 

    # Converting images to NumPy arrays.

     images_shape = (len(data), *data[0][0].shape)
     images = np.zeros(images_shape)
     labels = np.zeros(len(data))
     for i, (image, label) in enumerate(data):
         images[i] = image
         labels[i] = label
     images = images.reshape(len(images), -1) 

   # Scaling to mean 0 and std 1.

     if name == "train":
         scaler.fit(images)
     images = scaler.transform(images) 

    # Shuffling over the train set.

     if name == "train":
         images, labels = sklearn.utils.shuffle(images, labels)
     return [images, labels] 
 processed_train_data = preprocess(train_data, "train")
 processed_val_data = preprocess(val_data, "val")
 processed_test_data = preprocess(test_data, "test")
 model = DecisionTreeClassifier()
 def evaluate(data, name):
     if data is None:  # val/test can be empty
         return
     images, labels = data
     acc = model.score(images, labels)
     print(f"{name + ':':6} accuracy: {acc}")
     writer.add_scalar(f"{name}_accuracy", acc) 

# Train on train_data.

model.fit(*processed_train_data)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, presort='deprecated', random_state=None, splitter='best')

# Evaluation

 evaluate(processed_train_data, "train")
 evaluate(processed_val_data, "val")
 evaluate(processed_test_data, "test") 
 train: accuracy: 1.0
 val:   accuracy: 1.0 

Complete notebook from traingenerator can be viewed from here.

Code generated for Pytorch

 import numpy as np
 import torch
 from torch import optim, nn
 from torch.utils.data import DataLoader, TensorDataset
 from torchvision import models, datasets, transforms
 from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
 from ignite.metrics import Accuracy, Loss
 from datetime import datetime
 from tensorboardX import SummaryWriter
 from pathlib import Path 

** loading same as sklearn

# preprocessing

 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None 

    # Reading image files to pytorch dataset.

     transform = transforms.Compose([
         transforms.Resize(256), 
         transforms.CenterCrop(224), 
         transforms.ToTensor(), 
     ])
     dataset = datasets.ImageFolder(data, transform=transform)
     loader = DataLoader(dataset, batch_size=batch_size, shuffle=(name=="train"), **kwargs)
     return loader
 train_loader = preprocess(train_data, "train")
 val_loader = preprocess(val_data, "val")
 test_loader = preprocess(test_data, "test") 

# Setting up model, loss, optimizer.

 model = models.resnet18(pretrained=True)
 model = model.to(device)
 loss_func = nn.CrossEntropyLoss()
 optimizer = optim.Adam(model.parameters(), lr=lr) 

# Setting up pytorch-ignite trainer and evaluator.

 trainer = create_supervised_trainer(
     model,
     optimizer,
     loss_func,
     device=device,
 )
 metrics = {
     "accuracy": Accuracy(),
     "loss": Loss(loss_func),
 }
 evaluator = create_supervised_evaluator(
     model, metrics=metrics, device=device
 )
 @trainer.on(Events.ITERATION_COMPLETED(every=print_every))
 def log_batch(trainer):
     batch = (trainer.state.iteration - 1) % trainer.state.epoch_length + 1
     print(
         f"Epoch {trainer.state.epoch} / {num_epochs}, "
         f"batch {batch} / {trainer.state.epoch_length}: "
         f"loss: {trainer.state.output:.3f}"
     )
 @trainer.on(Events.EPOCH_COMPLETED)
 def log_epoch(trainer):
     print(f"Epoch {trainer.state.epoch} / {num_epochs} average results: ")
     def log_results(name, metrics, epoch):
         print(
             f"{name + ':':6} loss: {metrics['loss']:.3f}, "
             f"accuracy: {metrics['accuracy']:.3f}"
         )
         writer.add_scalar(f"{name}_loss", metrics["loss"], epoch)
         writer.add_scalar(f"{name}_accuracy", metrics["accuracy"], epoch) 

    # Training data.

     evaluator.run(train_loader)
     log_results("train", evaluator.state.metrics, trainer.state.epoch) 

    # Validation data.

     if val_loader:
         evaluator.run(val_loader)
         log_results("val", evaluator.state.metrics, trainer.state.epoch) 

    # Testing data.

     if test_loader:
         evaluator.run(test_loader)
         log_results("test", evaluator.state.metrics, trainer.state.epoch)
     print()
     print("-" * 80)
     print()
 @trainer.on(Events.EPOCH_COMPLETED) 

# saving checkpoint

 def checkpoint_model(trainer):
     torch.save(model, checkpoint_dir / f"model-epoch{trainer.state.epoch}.pt") 

# Starting training.

trainer.run(train_loader, max_epochs=num_epochs)

 Epoch 1 / 5, batch 1 / 1: loss: 8.112
 Epoch 1 / 5 average results: 
 train: loss: 10.275, accuracy: 0.000
 val:   loss: 11.407, accuracy: 0.000
 Epoch 2 / 5, batch 1 / 1: loss: 0.152
 Epoch 2 / 5 average results: 
 train: loss: 7.251, accuracy: 0.000
 val:   loss: 10.479, accuracy: 0.000
 Epoch 3 / 5, batch 1 / 1: loss: 0.185
 Epoch 3 / 5 average results: 
 train: loss: 4.322, accuracy: 0.500
 val:   loss: 10.263, accuracy: 0.000
 Epoch 4 / 5, batch 1 / 1: loss: 0.000
 Epoch 4 / 5 average results: 
 train: loss: 2.429, accuracy: 0.500
 val:   loss: 9.824, accuracy: 0.000
 Epoch 5 / 5, batch 1 / 1: loss: 0.000
 Epoch 5 / 5 average results: 
 train: loss: 1.521, accuracy: 0.750
 val:   loss: 9.791, accuracy: 0.000 

Complete notebook from traingenerator can be viewed from here.

Deployment using Heroku

After complete installation and logging onto Heroku, inside traingenerator run:

heroku create

git push heroku main

heroku open

EndNotes

To make contributions in the form of adding more templates make pull requests to the Github repository. Traingenerator is a simple, easy-to-use and user-friendly app for both technical and non-technical people. It’s auto code generation features come in very handy for large scale productions.  

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Jayita Bhattacharyya

Jayita Bhattacharyya

Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories