Abstraction is a common trait amongst the now widely used machine learning libraries or frameworks. Dusting off the nitty-gritty details under the rug and concentrating on implementing algorithms with more ease is what any data scientist would like to get their hands on.
TensorFlow rose into prominence for the very same reason- abstraction.
TensorFlow’s machine learning platform has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
Now TensorFlow 2.0, has been redesigned with a focus on developer productivity, simplicity, and ease of use.
There are multiple changes in TensorFlow 2.0 to make TensorFlow users more productive. TensorFlow 2.0 removes redundant APIs, makes APIs more consistent (Unified RNNs, Unified Optimizers), and improved integration with the Python runtime with Eager execution.
Here are few changes worth noticing:
Cleaner API
Many APIs are either gone or moved in TF 2.0 will be missing out on few APIs. Some of the major changes include removing tf.app, tf.flags, and tf.logging in favor of the now open-source absl-py, rehoming projects that lived in tf.contrib.
Some APIs have been replaced with their 2.0 equivalents – tf.summary, tf.keras.metrics, andtf.keras.optimizers.
Python Like Execution
TensorFlow 1.X required users to manually stitch together the graphs by making tf.*API calls. But TensorFlow 2.0 executes eagerly (like Python normally does) and graphs and sessions will be more like implementation details.
This eliminates the use of tf.control_dependencies() , as all lines of code execute in order.
Control Over Variables
TensorFlow 1.X required the user to keep track of the variables to recover for future use. The earlier version of TensorFlow relied heavily on implicit global namespaces.
Invoking tf.Variable(), would keep the variable even if the user loses track of it but can to recover, the name is required.
So if a data scientist wasn’t part of this initial stages of building a pipeline, it is really difficult to recover something that they never knew existed.
TensorFlow 2.0 eliminates all of these mechanisms (Variables 2.0 RFC) in favor of the default mechanism i.e if the user loses track of the variables; tf.Variable, it gets garbage collected.
Graph Mode Functions & Autograph
TensorFlow 2.0’s tf.function() will allow user to run functions as single graph (Functions 2.0 RFC). This mechanism allows TensorFlow 2.0 to gain all of the benefits of graph mode like optimised functions for node pruning or kernel fusion and also improved portability of functions; export and import.
outputs = session.run(f(placeholder), feed_dict={placeholder: input})
# TensorFlow 2.0
outputs = f(input)
AutoGraph will convert a subset of Python constructs into their TensorFlow equivalents:
- for/while -> tf.while_loop (break and continue are supported)
- if -> tf.cond
- for _ in dataset -> dataset.reduce
AutoGraph gives control over flow which makes it possible to implement complex machine learning programs such as reinforcement learning and custom training loops.
Managing Variables With Keras
Keras models and layers offer the convenient variables and trainable_variables properties, which recursively gather up all dependent variables.
Without Keras:
def dense(x, W, b):
return tf.nn.sigmoid(tf.matmul(x, W) + b)
@tf.function
def multilayer_perceptron(x, w0, b0, w1, b1, w2, b2 …):
x = dense(x, w0, b0)
x = dense(x, w1, b1)
x = dense(x, w2, b2)
…
# Managing w_i and b_i, and their shapes are defined far away from the code.
With Keras:
# Each layer can be called, with a signature equivalent to linear(x)
layers = [tf.keras.layers.Dense(hidden_size, activation=tf.nn.sigmoid) for _ in range(n)]
Keras layers are integrated with @tf.function and there is no need to use .fit()
perceptron = tf.keras.Sequential(layers)
A look at how Keras makes it easy to collect a subset of relevant variables while training a multi-headed model with shared trunk:
trunk = tf.keras.Sequential([...])
head1 = tf.keras.Sequential([…])
head2 = tf.keras.Sequential([…])
path1 = tf.keras.Sequential([trunk, head1])
path2 = tf.keras.Sequential([trunk, head2])
Dataset Iterations
tf.data.Dataset is the best way to stream training data from disk while iterating training data that doesn’t fit in to the memory. Datasets are iterables (not iterators), and work just like other Python iterables in Eager mode.
@tf.function
def train(model, dataset, optimizer):
for x, y in dataset:
with tf.GradientTape() as tape:
Keras .fit() API, there is no need to worry about dataset iteration.
model.compile(optimizer=optimizer, loss=loss_fn)
model.fit(dataset)
One common place where data-dependent control flow appears is in sequence models. tf.keras.layers.RNNwraps an RNN cell, allows for either statically or dynamically unroll the recurrence.
class DynamicRNN(tf.keras.Model):
def __init__(self, rnn_cell):
super(DynamicRNN, self).__init__(self)
self.cell = rnn_cell
For a more detailed overview of AutoGraph’s features, see the guide.
Data Aggregation
Unlike TF 1.x, the summaries are emitted directly to the writer; there is no separate “merge” op and no separate add_summary() call, which means that the step value must be provided at the callsite. And, tf.metrics can be used to to aggregate data and tf.summary to log it.
summary_writer = tf.summary.create_file_writer(‘/tmp/summaries’)
with summary_writer.as_default():
tf.summary.scalar(‘loss’, 0.1, step=42)
To aggregate data before logging them as summaries, use tf.metrics. Metrics are stateful; they accumulate values and return a cumulative result when calling .result(). And, to clear accumulated values .reset_states() can be used.
def train(model, optimizer, dataset, log_freq=10):
avg_loss = tf.keras.metrics.Mean(name=’loss’, dtype=tf.float32)
def test(model, test_x, test_y, step_num):
loss = loss_fn(model(test_x), test_y)
tf.summary.scalar(‘loss’, loss, step=step_num)
In addition to the above changes, Tensorflow 2.0, now, fully supports the Estimators. And, the team promises to fix the issues along the way.
Track the update project pipeline here.