TensorFlow released its new version 2.4 earlier this week. This version promises increased support for distributed training, a new NumPy frontend, and methods for monitoring and diagnosing bottlenecks. Notably, this release comes around the same time TensorFlow celebrated five years of its existence. Another important coincidence is that this release comes on the heels of an announcement from its rival Pytorch, which released its v1.7.1, just recently.
Some of the new features and updates of TensorFlow’s new release are discussed in this article.
New Features For Distributed Training
Parameter Server Strategy: The TensorFlow Distribute module, which is TensorFlow API for distribution strategy, from v2.4 will have experimental support for asynchronous training for modules with Parameter Server Strategy. Experimental APIs are the ones which will be added eventually to TensorFlow but may be subject to change in the backwards compatibility ways.
Sign up for your weekly dose of what's up in emerging technology.
Parameter Server Strategy is a common data parallel method for scaling up a machine learning model on several machines. It consists of workers and parameter servers, where the ‘worker’ reads and updates the variables created on the parameter server. Since the whole process of reading and updating is done independently without synchronisation with each other, it is also called asynchronous training.
Multi Worker Mirrored Strategy: This is an API Strategy that implements distributed training with synchronous data parallelism, like its counterpart Mirrored Strategy. The only difference between the two is that the former can enable training across multiple machines running on several GPUs each. To keep the variables in sync, Multi Worker Mirrored Strategy uses CollectiveOp, which is a single operation that can automatically choose an all-reduce algorithm based on the hardware and the network topology.
In the new version, Multi Worker Mirrored Strategy has graduated from experimental and is now a part of a stable API.
New Updates For Keras
Mixed Precision: While most TensorFlow models use the 32-bit floating type, there are a few lower-precision models which may use 16-bit floating type to limit the use of memory. Mixed Precision makes use of both 16-bit and 32-bit floating-point types during training to make the model run fast and use less memory. The Mixed Precision API helps in improving the model performance by at least three times on GPUs and up to 60% of TPUs.
In v2.4, Mixed Precision has moved out of the experimental phase and has been instated as a stable API.
Optimizers: In TensorFlow, Optimizers are an extended class that includes additional information to train a specific model, while improving its speed and performance. Examples of Optimizers include — SGD, RMSprop, Adam, Adadelta, Adagrad, Adamax, Nadam, and Ftrl.
The new TensorFlow release includes reconstructing the Optimizers class in Keras to enable the use of custom training loops. This update will help in writing the training code that works with any optimizer. Additionally, all the Optimizer subclasses will now accept gradient_transformers and gradient_aggregator arguments to easily define custom gradient transformations.
Improvements to Functional API Model: The new release also includes a major refactoring of the internals of Keras Functional API. With this update, there is expected to be an improvement in the memory consumption of the functional model and also the simplification of the triggering logic.
Experimental Support For NumPy API
TensorFlow v2.4 now introduces the experimental support for a subset of NumPy APIs which helps in running the NumPy code. The NumPy API is built on top of TensorFlow, which enables to interoperate seamlessly with the latter. This allows access to all TensorFlow APIs and provides an optimised execution with the help of compilation and auto-vectorisation.
TensorFlow 2.4 will enable support for the newly introduced NVIDIA Ampere GPU architecture as it can run with both CUDA 11 and cuDNN 8. For the uninitiated, CUDA (Computer Unified Device Architecture) is a parallel computing platform, and API model from NVIDIA and cuDNN is a library for deep neural networks built using CUDA.
The GitHub link can be found here.