Last updated August 24, 2021
In AI Mysteries

Discovering Symbolic Models From Deep Learning With Inductive Biases

Share

Published on July 4, 2021

by Victor Dey

In machine learning, the aim is to create algorithms that can learn and predict a required target output from the learnings. To achieve this, the learning algorithm is presented and fed with some examples it can train and learn from to achieve the intended relation of input and output values. Then the learner, a model, in this case, is intended to approximate the correct output, even for examples that have been unseen during the training phase. Without any additional assumptions, problems cannot be solved since unseen situations might have an arbitrary output value. The necessary assumptions about the nature of the target function values are subsumed in the phrase inductive bias. Inductive Bias, also sometimes known as learning bias, is a predictive learning algorithm that uses a set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered yet.

It can also be defined as the process of learning general principles from the data based on specific instances; in other words, it’s what any machine learning algorithm does when it produces a prediction for an unseen test instance based on a definite number of training instances. Inductive bias describes the tendency for a system to prefer a certain set of assumptions over others within the observed data. Without inductive bias introduced, a learner model cannot differentiate or make decisions from observed samples to new samples, and this process is better than random guessing.

Every machine learning algorithm used in practice to date, from the nearest neighbors to gradient boosting machines, come with their own set of inductive biases, such learning algorithms at times possess a bias to learn about similar items or close in terms of attributes to one another in a feature space and hence they are more likely to be classified in the same class. Linear models such as logistic regression assume that a linear boundary can always separate the classes, although this being a hard bias, as the model cannot learn anything else.

Algebraic expressions are usually compact and present us with explicit interpretations and generalize assumptions well. However, finding such algebraic expressions is a difficult task. Symbolic regression comes to the rescue as one of the options. It is a supervised machine learning technique that assembles analytic functions to create a model for a given dataset. However, genetic algorithms are traditionally used, which are essentially brute force procedures that scale exponentially with input variables and operators. On the other hand, deep learning methods allow efficient training of complex models on a high dimensional dataset. However, these learned models are typically black boxes and can be difficult to interpret.

About Symbolic Models Framework with Inductive Biases

Symbolic Model Framework proposes a general framework to leverage the advantages of both traditional deep learning and symbolic regression. As an example, the study of Graph Networks (GNs or GNNs) can be presented, as they have strong and well-motivated inductive biases that are very well suited to complex problems that can be explained. Symbolic regression is applied to fit the different internal parts of the learned model that operate on a reduced size of representations. A Number of symbolic expressions can also be joined together, giving rise to an overall algebraic equation equivalent to the trained Graph Network. The framework can be applied to more such problems as rediscovering force laws, rediscovering Hamiltonians, and a real-world astrophysical challenge, demonstrating that drastic improvements can be made to generalization, and plausible analytical expressions are being made distilled. Not only can it recover the injected closed-form physical laws for the Newtonian and Hamiltonian examples, but it can also derive a new interpretable closed-form analytical expression that can be useful in astrophysics.

The Symbolic Model framework can be summarized as :

Engineering a deep learning model with a separable internal structure that would provide an inductive bias well matched to the nature of the data. Specifically, Graph Networks can be used as the core inductive bias into the models in the case of interacting particles.

The model is then trained end-to-end, making use of available data.

Symbolic expressions are fitted to the distinct functions learned by the created model internally.

Symbolic expressions then replace functions in the deep model.

`Image source : https://arxiv.org/pdf/2006.11287.pdf`

Getting Started with Creating a Deep Learning Model With Inductive Bias

This demonstration will try to predict the path dynamics from a simple particle model using Graph Neural Network. We will also try to induce Low Dimensionality for a clearer understanding and predict the dynamic path movement for newly induced particles and extract it to the symbolic equation. The following implementation is inspired by the official demo of the Symbolic model, whose Github repository can be explored here.

Installing the prerequisites

The first step will be to install all the dependencies to create the model, using the following code.

 #Basic pre-reqs:
 import numpy as np
 import os
 import torch
 from torch.autograd import Variable
 from matplotlib import pyplot as plt
 %matplotlib inline

Downloading further prerequisites and then code for simulations and model files, here we will introduce the celluloid library to create interactive animations from our model

!pip install celluloid #to create interactive animations

Installing geometric torch to process our complex model,

 version_nums = torch.__version__.split('.')
 # Torch Geometric seems to always build for *.*.0 of torch :
 version_nums[-1] = '0' + version_nums[-1][1:]
 os.environ['TORCH'] = '.'.join(version_nums)
 !pip install --upgrade torch-scatter -f https://pytorch-geometric.com/whl/torch-${TORCH}.html && pip install --upgrade torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}.html && pip install --upgrade torch-geometric
 #importing the particle model and simulation model
 !wget https://raw.githubusercontent.com/anon928374/gn/master/models.py -O models.py
 !wget https://raw.githubusercontent.com/anon928374/gn/master/simulate.py -O simulate.py
 #calling model and simulation environment
 import models
 import simulate

As this would require heavy processing, make sure you are making use of a GPU

torch.ones(1).cuda() #calling GPU

Now we will create our simulation and set the parameters for it:

 # Number of simulations to run :
 ns = 10000
 # Potential 
 sim = 'spring'
 # Number of nodes
 n = 4
 # Dimension
 dim = 2
 # Number of time steps
 nt = 1000
 #Standard simulation sets:
 n_set = [4, 8]
 sim_sets = [
  {'sim': 'r1', 'dt': [5e-3], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'r2', 'dt': [1e-3], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'spring', 'dt': [1e-2], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'string', 'dt': [1e-2], 'nt': [1000], 'n': [30], 'dim': [2]},
  {'sim': 'charge', 'dt': [1e-3], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'superposition', 'dt': [1e-3], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'damped', 'dt': [2e-2], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
  {'sim': 'discontinuous', 'dt': [1e-2], 'nt': [1000], 'n': n_set, 'dim': [2, 3]},
 ]
 #Select the hand-tuned dt value for a smooth simulation
 # (since scales are different in each potential):
 dt = [ss['dt'][0] for ss in sim_sets if ss['sim'] == sim][0]
 title = '{}_n={}_dim={}_nt={}_dt={}'.format(sim, n, dim, nt, dt)
 print('Running on', title)

Calling our simulation dataset and setting the animation :

 from simulate import SimulationDataset
 s = SimulationDataset(sim, n=n, dim=dim, nt=nt//2, dt=dt)
 base_str = './'
 data_str = title
 s.simulate(ns)

Checking the shape of the called dataset,

 data = s.data
 s.data.shape

Output :

(10000, 500, 4, 6)

Synthesizing the set animation into action,

s.plot(0, animate=True, plot_size=False)

We will get the following interactive animation with control bar from celluloid :

Further tuning our animations and making use of deep neural networks for prediction,

 accel_data = s.get_acceleration() # adding Depth to our animations
 X = torch.from_numpy(np.concatenate([s.data[:, i] for i in range(0, s.data.shape[1], 5)]))
 y = torch.from_numpy(np.concatenate([accel_data[:, i] for i in range(0, s.data.shape[1], 5)]))
 #dividing dataset into train and test 
 from sklearn.model_selection import train_test_split 
 X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle=False)
 #adding optimizers and creating the graph neural network
 import torch
 from torch import nn
 from torch.functional import F
 from torch.optim import Adam
 from torch_geometric.nn import MetaLayer, MessagePassing
 from models import OGN, varOGN, make_packer, make_unpacker, get_edge_index
 aggr = 'add'
 hidden = 300
 test = '_l1_'
 #This test applies an explicit bottleneck:
 msg_dim = 100
 n_f = data.shape[3]
 #importing Dataloader for feeding input data
 from torch_geometric.data import Data, DataLoader
 from models import get_edge_index
 edge_index = get_edge_index(n, sim)
 if test == '_kl_':
     ogn = varOGN(n_f, msg_dim, dim, dt=0.1, hidden=hidden, edge_index=get_edge_index(n, sim), aggr=aggr).cuda()
 else:
     ogn = OGN(n_f, msg_dim, dim, dt=0.1, hidden=hidden, edge_index=get_edge_index(n, sim), aggr=aggr).cuda()
 messages_over_time = []
 ogn = ogn.cuda()
 #generating the tesors
 _q = Data(
     x=X_train[0].cuda(),
     edge_index=edge_index.cuda(),
     y=y_train[0].cuda())
 ogn(_q.x, _q.edge_index), ogn.just_derivative(_q).shape, _q.y.shape, ogn.loss(_q),
 Output :
 (tensor([[ 0.0231,  0.0075],
          [ 0.0269,  0.0156],
          [ 0.0383, -0.0070],
          [ 0.0295,  0.0047]], device='cuda:0', grad_fn=<AddmmBackward>),
  torch.Size([4, 2]),
  torch.Size([4, 2]),
  tensor(24.7884, device='cuda:0', grad_fn=<SumBackward0>))
 #setting the batch size
 batch = int(64 * (4 / n)**2)
 trainloader = DataLoader(
     [Data(
         Variable(X_train[i]),
         edge_index=edge_index,
         y=Variable(y_train[i])) for i in range(len(y_train))],
     batch_size=batch,
     shuffle=True
 )
 testloader = DataLoader(
     [Data(
         X_test[i],
         edge_index=edge_index,
         y=y_test[i]) for i in range(len(y_test))],
     batch_size=1024,
     shuffle=True
 )
 from torch.optim.lr_scheduler import ReduceLROnPlateau, OneCycleLR
 Defining loss function for the graph neural network.
 #defining loss function 
 def new_loss(self, g, augment=True, square=False):
     if square:
         return torch.sum((g.y - self.just_derivative(g, augment=augment))**2)
     else:
         base_loss = torch.sum(torch.abs(g.y - self.just_derivative(g, augment=augment)))
         if test in ['_l1_', '_kl_']:
             s1 = g.x[self.edge_index[0]]
             s2 = g.x[self.edge_index[1]]
             if test == '_l1_':
                 m12 = self.message(s1, s2)
                 regularization = 1e-2
                 #Want one loss value per row of g.y:
                 normalized_l05 = torch.sum(torch.abs(m12))
                 return base_loss, regularization * batch * normalized_l05 / n**2 * n
             elif test == '_kl_':
                 regularization = 1
                 #Want one loss value per row of g.y:
                 tmp = torch.cat([s1, s2], dim=1)  # tmp has shape [E, 2 * in_channels]
                 raw_msg = self.msg_fnc(tmp)
                 mu = raw_msg[:, 0::2]
                 logvar = raw_msg[:, 1::2]
                 full_kl = torch.sum(torch.exp(logvar) + mu**2 - logvar)/2.0
                 return base_loss, regularization * batch * full_kl / n**2 * n
         return base_loss
 Setting optimizers and number of epochs to train our Graph Neural Network.
 init_lr = 1e-3
 opt = torch.optim.Adam(ogn.parameters(), lr=init_lr, weight_decay=1e-8)
 # total_epochs = 200
 total_epochs = 30
 batch_per_epoch = int(1000*10 / (batch/32.0))
 sched = OneCycleLR(opt, max_lr=init_lr,
                    steps_per_epoch=batch_per_epoch,#len(trainloader),
                    epochs=total_epochs, final_div_factor=1e5)
 batch_per_epoch
 from tqdm import tqdm
 import numpy as onp
 onp.random.seed(0)
 test_idxes = onp.random.randint(0, len(X_test), 1000)
 #Record messages over test dataset here:
 newtestloader = DataLoader(
     [Data(
         X_test[i],
         edge_index=edge_index,
         y=y_test[i]) for i in test_idxes],
     batch_size=len(X_test),
     shuffle=False
 )

Setting our model equation for prediction,

 import numpy as onp
 import pandas as pd
 def get_messages(ogn):
     def get_message_info(tmp):
         ogn.cpu()
         s1 = tmp.x[tmp.edge_index[0]]
         s2 = tmp.x[tmp.edge_index[1]]
         tmp = torch.cat([s1, s2], dim=1)  # tmp has shape [E, 2 * in_channels]
         if test == '_kl_':
             raw_msg = ogn.msg_fnc(tmp)
             mu = raw_msg[:, 0::2]
             logvar = raw_msg[:, 1::2]
             m12 = mu
         else:
             m12 = ogn.msg_fnc(tmp)
         all_messages = torch.cat((
             s1,
             s2,
             m12), dim=1)
         if dim == 2:
             columns = [elem%(k) for k in range(1, 3) for elem in 'x%d y%d vx%d vy%d q%d m%d'.split(' ')]
             columns += ['e%d'%(k,) for k in range(msg_dim)]
         elif dim == 3:
             columns = [elem%(k) for k in range(1, 3) for elem in 'x%d y%d z%d vx%d vy%d vz%d q%d m%d'.split(' ')]
             columns += ['e%d'%(k,) for k in range(msg_dim)]
         return pd.DataFrame(
             data=all_messages.cpu().detach().numpy(),
             columns=columns
         )
     msg_info = []
     for i, g in enumerate(newtestloader):
         msg_info.append(get_message_info(g))
     msg_info = pd.concat(msg_info)
     msg_info['dx'] = msg_info.x1 - msg_info.x2
     msg_info['dy'] = msg_info.y1 - msg_info.y2
     if dim == 2:
         msg_info['r'] = np.sqrt(
             (msg_info.dx)**2 + (msg_info.dy)**2
         )
     elif dim == 3:
         msg_info['dz'] = msg_info.z1 - msg_info.z2
         msg_info['r'] = np.sqrt(
             (msg_info.dx)**2 + (msg_info.dy)**2 + (msg_info.dz)**2
         )
     return msg_info
 #importing recorded model
 recorded_models = []

Training our GNN Model:

 for epoch in tqdm(range(epoch, total_epochs)):
     ogn.cuda()
     total_loss = 0.0
     i = 0
     num_items = 0
     while i < batch_per_epoch:
         for ginput in trainloader:
             if i >= batch_per_epoch:
                 break
             opt.zero_grad()
             ginput.x = ginput.x.cuda()
             ginput.y = ginput.y.cuda()
             ginput.edge_index = ginput.edge_index.cuda()
             ginput.batch = ginput.batch.cuda()
             if test in ['_l1_', '_kl_']:
                 loss, reg = new_loss(ogn, ginput, square=False)
                 ((loss + reg)/int(ginput.batch[-1]+1)).backward()
             else:
                 loss = ogn.loss(ginput, square=False)
                 (loss/int(ginput.batch[-1]+1)).backward()
             opt.step()
             sched.step()
             total_loss += loss.item()
             i += 1
             num_items += int(ginput.batch[-1]+1)
     cur_loss = total_loss/num_items
     print(cur_loss)
     cur_msgs = get_messages(ogn)
     cur_msgs['epoch'] = epoch
     cur_msgs['loss'] = cur_loss
     messages_over_time.append(cur_msgs)
     ogn.cpu()
     from copy import deepcopy as copy
     recorded_models.append(ogn.state_dict())

Epoch Output :

  0%|          | 0/30 [00:00<?, ?it/s]12.383322237300872
   7%|▋         | 2/30 [01:50<25:52, 55.44s/it]8.476871449041367
  10%|█         | 3/30 [02:46<24:56, 55.42s/it]6.120438998270035
  13%|█▎        | 4/30 [03:41<24:00, 55.42s/it]5.34595146484375
  17%|█▋        | 5/30 [04:37<23:06, 55.44s/it]4.251027070164681
  20%|██        | 6/30 [05:32<22:11, 55.47s/it]3.295241960835457
  23%|██▎       | 7/30 [06:28<21:16, 55.50s/it]2.725217816901207
  27%|██▋       | 8/30 [07:24<20:22, 55.58s/it]2.4214660188436508
  30%|███       | 9/30 [08:19<19:27, 55.62s/it]2.2335157041072846
  33%|███▎      | 10/30 [09:15<18:33, 55.66s/it]2.0471408730864527
  37%|███▋      | 11/30 [10:11<17:37, 55.64s/it]1.904685971081257
  40%|████      | 12/30 [11:06<16:41, 55.66s/it]1.7552172343373298
  43%|████▎     | 13/30 [12:02<15:46, 55.70s/it]1.5958639140605926
  47%|████▋     | 14/30 [12:58<14:51, 55.73s/it]1.4858177517175675
  50%|█████     | 15/30 [13:54<13:57, 55.82s/it]1.401456927740574
  53%|█████▎    | 16/30 [14:50<13:03, 55.97s/it]1.2299495659947395
  57%|█████▋    | 17/30 [15:46<12:07, 55.97s/it]1.1393675281226634
  60%|██████    | 18/30 [16:42<11:11, 55.93s/it]1.0435665905714036
  63%|██████▎   | 19/30 [17:38<10:15, 55.91s/it]0.9252723445832729
  67%|██████▋   | 20/30 [18:34<09:18, 55.90s/it]0.7918352241277695
  70%|███████   | 21/30 [19:30<08:22, 55.89s/it]0.7148197756946086
  73%|███████▎  | 22/30 [20:25<07:27, 55.88s/it]0.6202942582428456
  77%|███████▋  | 23/30 [21:21<06:30, 55.85s/it]0.5343491077363491
  80%|████████  | 24/30 [22:17<05:35, 55.85s/it]0.453341006103158
  83%|████████▎ | 25/30 [23:13<04:39, 55.84s/it]0.39567755371034147
  87%|████████▋ | 26/30 [24:09<03:43, 55.83s/it]0.3454551042586565
  90%|█████████ | 27/30 [25:04<02:47, 55.71s/it]0.30814071524441244
  93%|█████████▎| 28/30 [26:00<01:51, 55.68s/it]0.2823368747919798
  97%|█████████▋| 29/30 [26:55<00:55, 55.57s/it]0.26827718360722064
 100%|██████████| 30/30 [27:51<00:00, 55.70s/it]0.2626891137778759

Predicting new Particles & Plotting our simulation from the trained model,

 #setting color for predicted particles 
 from simulate import make_transparent_color
 from scipy.integrate import odeint
 Introducing Different Point of view to observe predictions :
 fig, ax = plt.subplots(1, 2, figsize=(8, 4))
 camera = Camera(fig)
 for current_model in [-1] + [1, 34, 67, 100, 133, 166, 199]:
     i = 4 #Use this simulation
     if current_model > len(recorded_models):
         continue
     #Truth:
     cutoff_time = 300
     times = onp.array(s.times)[:cutoff_time]
     x_times = onp.array(data[i, :cutoff_time])
     masses = x_times[:, :, -1]
     length_of_tail = 75
     #Learned:
     e = edge_index.cuda()
     ogn.cpu()
     if current_model > -1:
         ogn.load_state_dict(recorded_models[current_model])
     else:
         # Random model!
         ogn = OGN(n_f, msg_dim, dim, dt=0.1, hidden=hidden, edge_index=get_edge_index(n, sim), aggr=aggr).cuda()
     ogn.cuda()
     def odefunc(y, t=None):
         y = y.reshape(4, 6).astype(np.float32)
         cur = Data(
             x=torch.from_numpy(y).cuda(),
             edge_index=e
         )
         dx = y[:, 2:4]
         dv = ogn.just_derivative(cur).cpu().detach().numpy()
         dother = np.zeros_like(dx)
         return np.concatenate((dx, dv, dother), axis=1).ravel()
     datai = odeint(odefunc, (onp.asarray(x_times[0]).ravel()), times).reshape(-1, 4, 6)
     x_times2 = onp.array(datai)
     d_idx = 10
     for t_idx in range(d_idx, cutoff_time, d_idx):
         start = max([0, t_idx-length_of_tail])
         ctimes = times[start:t_idx]
         cx_times = x_times[start:t_idx]
         cx_times2 = x_times2[start:t_idx]
         for j in range(n):
             rgba = make_transparent_color(len(ctimes), j/n)
             ax[0].scatter(cx_times[:, j, 0], cx_times[:, j, 1], color=rgba)
             ax[1].scatter(cx_times2[:, j, 0], cx_times2[:, j, 1], color=rgba)
             black_rgba = rgba
             black_rgba[:, :3] = 0.75
             ax[1].scatter(cx_times[:, j, 0], cx_times[:, j, 1], color=black_rgba, zorder=-1)
         for k in range(2):
             ax[k].set_xlim(-1, 3)
             ax[k].set_ylim(-3, 1)
         plt.tight_layout()
         camera.snap()
 # camera.animate().save('multiple_animations_with_comparison.mp4')
 from IPython.display import HTML
 HTML(camera.animate().to_jshtml())

Final Output :

As we can observe, the predicted particles and their path are indicated with a transparent grey color!

EndNotes

This article tried to explore and learn about Symbolic Models with Inductive Biases and how they can be created using Deep Learning Networks. You can find the implemented code in the following colab notebook here.

REFERENCES

Access all our open Survey & Awards Nomination forms in one place

Victor Dey

Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.