As the popularity and need for deep learning networks increase, there has been a lot of effort to build tools that ease the development of deep learning models. One such tool that we will discuss today is MXNet. You might be wondering what makes MXNet better than the already existing deep learning frameworks like Theano or Caffe. The existing frameworks are programming language-specific. This problem is overcome by MXNet and it provides one system for different programming flavours.
In this article, we will look into
- Why MXNet?
- A complete overview of MXNet
- Implementation of MXNet on random data
MXNet is an open-source deep learning framework that is used to define, train and deploy neural networks. MXNet is short for mix-net because this framework was developed by combining various programming approaches into one. This framework supports Python, R, C++, Julia, Perl and many other languages which eliminates the need to learn new languages in order to use different frameworks.
Another advantage is that the models built using MXNet are portable such that they can fit in small amounts of memory. So, once your model is trained and tested, it can be easily deployed to mobile devices or connected systems. MXNets are scalable to be used on multiple machines and GPU simultaneously. This is why Amazon has chosen this framework for its deep learning web services.
A Complete Overview of MXNet
Let us look at the entire architecture of the MXNet framework. I will discuss the most important ones below.
The NDArray: The primary data type of the MXNet framework is NDArray. This is an n-dimensional array that stores data belonging to a similar type. If you have worked with Python’s NumPy arrays, NDArrays are quite similar. Deep neural networks have thousands of parameters to store and all of this is stored in these arrays. By default, an NDArray holds 32-bit floats, but we can customize that.
The Symbolic API: Inside any given layer of a neural network, the processing happens simultaneously. Independent layers could also run in parallel. So, for a good performance, we have to implement parallel processing using multithreading or something similar. MXNet implemented this using dataflow programming and symbolic API.
Dataflow programming is a type of parallel programming where the data flows through a graph. It can be thought of as a black box that takes in inputs and gives multiple outputs simultaneously without specifying underlying behaviour.
In the figure above, the execution of (A*B) and (C*D) happens at the same time. A, B, C, D, E are all symbols that are computed in parallel. MXNet will use this information for optimisation purposes.
Binder: As the name implies, this process is meant to bind the data stored in the NDArray with its corresponding symbols for execution. It is necessary to specify the context, that is, whether the execution has to take place in the CPU or GPU. Once our data is bound to the symbols, the forward propagation can take place.
KV Store: This is a key-value store that is used for synchronization of data in multiple devices. There are two main operations in the KV store. Push operation is used to push a key-value pair to the store and Pull is used to retrieve some key from the store. This is again done for parallel computation and increasing efficiency in the architecture of the framework.
Implementation of MXNet on Random Data
Based on the above description of the framework, let us implement them to get a better understanding. For this implementation, we will be generating random data so do not try and make sense out of it.
The first step is installing the packages. I will use python programming language, but if you would like tutorials on using other languages click here. To install MXNet use this command
pip install mxnet
Once the installation is done, we will create a dataset and store them in NDArrays.
import mxnet as mx import numpy as np custom_data = 1000 trainset = 800 testset = custom_data - trainset features_size = 100 targets_size = 10 batch=10 ft= mx.nd.uniform(low=0, high=1, shape=(custom_data,features_size))
target = mx.nd.empty((custom_data,)) for i in range(0,custom_data-1): target[i] = np.random.randint(0,targets)
We have generated 1000 random data points for training our model. The target contains integers between 0 and 9. This data is stored in the form of NDArray. Let us split the data into train and test sets. I have split the data as 80% train and 20% test.
xtrain = mx.nd.crop(dataset, begin=(0,0), end=(trainset,features-1)) xtest = mx.nd.crop(dataset, begin=(trainset,0), end=(custom_data,features-1)) ytrain = target[0:trainset] ytest = target[trainset:custom_data]
The next process is using symbols for this dataset for parallel computation to take place.
data = mx.sym.Variable('data') Now that we have assigned a symbol for data, let us build the model. layer1 = mx.sym.FullyConnected(data, name='layer1', num_hidden=64) relu1 = mx.sym.Activation(layer1, name='relu1', act_type="relu") layer2 = mx.sym.FullyConnected(relu1, name='layer2', num_hidden=target) output = mx.sym.SoftmaxOutput(layer2, name='softmax') model = mx.mod.Module(output) train_iteration = mx.io.NDArrayIter(data=xtrain,label=ytrain,batch_size=batch)
Once we have assigned our symbols to the correct NDArray, we need to bind these two together.
Let us now assign optimizers and fit the model on training data.
mod.init_optimizer(optimizer='sgd', optimizer_params=(('learning_rate', 0.1), )) mod.fit(train_iter, num_epoch=50)
Though the accuracy here looks great, it is not an actual dataset and was only to explain NDArray, Binding and Symbols used in MXNet.
MXNet is a machine learning library combining symbolic expression with array computation to maximize efficiency and flexibility. Parallel computation with this kind of efficiency can help in making the implementation of deep learning modules even in systems without a built-in GPU. MXNet is officially released in Apache and is an up and coming framework for developers for any programming language.