Last updated February 28, 2024
In AI Mysteries

Generate Your ML Code In Few Clicks Using Train Generator

Published on January 4, 2021
by Jayita Bhattacharyya

TrainGenerator is a Streamlit based web app for machine learning template code generation surpassing the different stages of data loading, preprocessing, model development, hyperparameter setting, and declaring other such constraints for complete model building. This wonderful open-source software has been created by Johannes Rieke, a machine learning engineer. This eases the task of data scientists and also non-technical people in the field of data science and machine learning. The code can then be used in Google Colab notebook or downloaded in .py or .ipynb formats.

Traingenerator allows users to add their custom templates also. Until now, only image classification algorithms have been released. Soon object detection and other use cases will be seen. The left sidebar of the web app contains parameter specifications. Under framework selection, it has options for PyTorch and scikit-learn libraries. For model selection for PyTorch there is Alexnet, Resnet, VGGnet, and DenseNet along with options for selecting pre-trained model built on ImageNet and for scikit-learn there is Support vectors, Random forest, Perceptron, K-nearest neighbours, Decision trees. Input data format specification there is Numpy files or Image files. Under preprocessing options include image resizing compatible with the model, centre crop image augmentation, scaling mean and standard deviation for the pre-trained model. Then comes the training options, including GPU availability and save a model checkpoint. Hyperparameters include loss functions (CrossEntropyLoss or BCEWithLogitsLoss), optimizers (Adam, Adadelta, Adagrad, Adamax, RMSprop, SGD), other parameters that can be can be specified manually are learning rate, batch size, epochs, printing progress after every batch. Lastly, there is an option for selecting visualisation (log metrics) in the form of TensorBoard, comet.ml or none.

Code Snippet

There are two ways to use – web app(as mentioned above) and running locally

git clone https://github.com/jrieke/traingenerator.git

cd traingenerator

pip install -r requirements.txt

streamlit run app/main.py

Code Generated for sklearn

 import numpy as np
 import sklearn
 from sklearn.tree import DecisionTreeClassifier
 from torchvision import datasets, transforms
 import urllib
 import zipfile
 from tensorboardX import SummaryWriter
 from datetime import datetime

# comment out this part to use own data

 url = "https://github.com/jrieke/traingenerator/raw/main/data/fake-image-data.zip"
 zip_path, _ = urllib.request.urlretrieve(url)
 with zipfile.ZipFile(zip_path, "r") as f:
     f.extractall("data")

# Data insertion

 train_data = "data/image-data"  # required
 val_data = "data/image-data"    # optional
 test_data = None                # optional

# Setting up logging.

 experiment_id = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')
 writer = SummaryWriter(logdir=f"logs/{experiment_id}")

# preprocessing

# Setting up a scalar.

 scaler = sklearn.preprocessing.StandardScaler()
 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None

# Reading image files to pytorch dataset

     transform = transforms.Compose([
         transforms.Resize(28), 
         transforms.CenterCrop(28), 
         transforms.ToTensor()
     ])
     data = datasets.ImageFolder(data, transform=transform)

# Converting images to NumPy arrays.

     images_shape = (len(data), *data[0][0].shape)
     images = np.zeros(images_shape)
     labels = np.zeros(len(data))
     for i, (image, label) in enumerate(data):
         images[i] = image
         labels[i] = label
     images = images.reshape(len(images), -1)

# Scaling to mean 0 and std 1.

     if name == "train":
         scaler.fit(images)
     images = scaler.transform(images)

# Shuffling over the train set.

     if name == "train":
         images, labels = sklearn.utils.shuffle(images, labels)
     return [images, labels]

 processed_train_data = preprocess(train_data, "train")
 processed_val_data = preprocess(val_data, "val")
 processed_test_data = preprocess(test_data, "test")
 model = DecisionTreeClassifier()
 def evaluate(data, name):
     if data is None:  # val/test can be empty
         return
     images, labels = data
     acc = model.score(images, labels)
     print(f"{name + ':':6} accuracy: {acc}")
     writer.add_scalar(f"{name}_accuracy", acc)

# Train on train_data.

model.fit(*processed_train_data)

DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2,min_weight_fraction_leaf=0.0, presort='deprecated', random_state=None, splitter='best')

# Evaluation

 evaluate(processed_train_data, "train")
 evaluate(processed_val_data, "val")
 evaluate(processed_test_data, "test")

 train: accuracy: 1.0
 val:   accuracy: 1.0

Complete notebook from traingenerator can be viewed from here.

Code generated for Pytorch

 import numpy as np
 import torch
 from torch import optim, nn
 from torch.utils.data import DataLoader, TensorDataset
 from torchvision import models, datasets, transforms
 from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator
 from ignite.metrics import Accuracy, Loss
 from datetime import datetime
 from tensorboardX import SummaryWriter
 from pathlib import Path

** loading same as sklearn

# preprocessing

 def preprocess(data, name):
     if data is None:  # val/test can be empty
         return None

# Reading image files to pytorch dataset.

     transform = transforms.Compose([
         transforms.Resize(256), 
         transforms.CenterCrop(224), 
         transforms.ToTensor(), 
     ])
     dataset = datasets.ImageFolder(data, transform=transform)
     loader = DataLoader(dataset, batch_size=batch_size, shuffle=(name=="train"), **kwargs)
     return loader
 train_loader = preprocess(train_data, "train")
 val_loader = preprocess(val_data, "val")
 test_loader = preprocess(test_data, "test")

# Setting up model, loss, optimizer.

 model = models.resnet18(pretrained=True)
 model = model.to(device)
 loss_func = nn.CrossEntropyLoss()
 optimizer = optim.Adam(model.parameters(), lr=lr)

# Setting up pytorch-ignite trainer and evaluator.

 trainer = create_supervised_trainer(
     model,
     optimizer,
     loss_func,
     device=device,
 )
 metrics = {
     "accuracy": Accuracy(),
     "loss": Loss(loss_func),
 }
 evaluator = create_supervised_evaluator(
     model, metrics=metrics, device=device
 )
 @trainer.on(Events.ITERATION_COMPLETED(every=print_every))
 def log_batch(trainer):
     batch = (trainer.state.iteration - 1) % trainer.state.epoch_length + 1
     print(
         f"Epoch {trainer.state.epoch} / {num_epochs}, "
         f"batch {batch} / {trainer.state.epoch_length}: "
         f"loss: {trainer.state.output:.3f}"
     )
 @trainer.on(Events.EPOCH_COMPLETED)
 def log_epoch(trainer):
     print(f"Epoch {trainer.state.epoch} / {num_epochs} average results: ")
     def log_results(name, metrics, epoch):
         print(
             f"{name + ':':6} loss: {metrics['loss']:.3f}, "
             f"accuracy: {metrics['accuracy']:.3f}"
         )
         writer.add_scalar(f"{name}_loss", metrics["loss"], epoch)
         writer.add_scalar(f"{name}_accuracy", metrics["accuracy"], epoch)

# Training data.

     evaluator.run(train_loader)
     log_results("train", evaluator.state.metrics, trainer.state.epoch)

# Validation data.

     if val_loader:
         evaluator.run(val_loader)
         log_results("val", evaluator.state.metrics, trainer.state.epoch)

# Testing data.

     if test_loader:
         evaluator.run(test_loader)
         log_results("test", evaluator.state.metrics, trainer.state.epoch)
     print()
     print("-" * 80)
     print()
 @trainer.on(Events.EPOCH_COMPLETED)

# saving checkpoint

 def checkpoint_model(trainer):
     torch.save(model, checkpoint_dir / f"model-epoch{trainer.state.epoch}.pt")

# Starting training.

trainer.run(train_loader, max_epochs=num_epochs)

 Epoch 1 / 5, batch 1 / 1: loss: 8.112
 Epoch 1 / 5 average results: 
 train: loss: 10.275, accuracy: 0.000
 val:   loss: 11.407, accuracy: 0.000
 Epoch 2 / 5, batch 1 / 1: loss: 0.152
 Epoch 2 / 5 average results: 
 train: loss: 7.251, accuracy: 0.000
 val:   loss: 10.479, accuracy: 0.000
 Epoch 3 / 5, batch 1 / 1: loss: 0.185
 Epoch 3 / 5 average results: 
 train: loss: 4.322, accuracy: 0.500
 val:   loss: 10.263, accuracy: 0.000
 Epoch 4 / 5, batch 1 / 1: loss: 0.000
 Epoch 4 / 5 average results: 
 train: loss: 2.429, accuracy: 0.500
 val:   loss: 9.824, accuracy: 0.000
 Epoch 5 / 5, batch 1 / 1: loss: 0.000
 Epoch 5 / 5 average results: 
 train: loss: 1.521, accuracy: 0.750
 val:   loss: 9.791, accuracy: 0.000

Complete notebook from traingenerator can be viewed from here.

Deployment using Heroku

After complete installation and logging onto Heroku, inside traingenerator run:

heroku create

git push heroku main

heroku open

EndNotes

To make contributions in the form of adding more templates make pull requests to the Github repository. Traingenerator is a simple, easy-to-use and user-friendly app for both technical and non-technical people. It’s auto code generation features come in very handy for large scale productions.

Access all our open Survey & Awards Nomination forms in one place >>

Jayita Bhattacharyya

Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Generate Your ML Code In Few Clicks Using Train Generator

Jayita Bhattacharyya

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discord Server

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

KissanAI Releases Dhenu Llama 3, an Indic LLM for Farmers

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Is it Humane to Bash Humane Ai Pin?

Meta Llama 3 Now Available on Databricks For Enterprise

How Databricks is Enabling Agriculture’s Data Revolution with UPL

How Good is Llama 3 for Indic Languages?

OpenAI Hires Pragya Misra As Its First Employee in India

Meta Forces Developers Cite ‘Llama 3’ in their AI Development

India is Making its Own AI Servers

World's Biggest Media & Analyst firm specializing in AI

Advertise with us

AIM publishes every day, and we believe in quality over quantity, honesty over spin. We offer a wide variety of branding and targeting options to make it easy for you to propagate your brand.

Branded Content

AIM Brand Solutions, a marketing division within AIM, specializes in creating diverse content such as documentaries, public artworks, podcasts, videos, articles, and more to effectively tell compelling stories.

Corporate Upskilling

ADaSci Corporate training program on Generative AI provides a unique opportunity to empower, retain and advance your talent

Hackathons

With MachineHack you can not only find qualified developers with hiring challenges but can also engage the developer community and your internal workforce by hosting hackathons.

Talent Assessment

Conduct Customized Online Assessments on our Powerful Cloud-based Platform, Secured with Best-in-class Proctoring

Research & Advisory

AIM Research produces a series of annual reports on AI & Data Science covering every aspect of the industry. Request Customised Reports & AIM Surveys for a study on topics of your interest.

Conferences & Events

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives.

AIM Launches the 3rd Edition of Data Engineering Summit. May 30-31, Bengaluru