The developers of AI systems have entered a phase where tweaking algorithms and pumping up accuracy will do no good. Questions such as fairness and privacy are more important now than ever. But, an organisation cannot afford or expect a machine learning engineer to develop tools from scratch that can cater to the different demands at different stages of building a pipeline. Google is now offering a one-stop solution to all these challenges through its TensorFlow community. The team at TensorFlow have built tools to assist and overcome the errors that surface in data collection, processing, loading and deployment.
From data cards to differential privacy, the tool-kit comprises a variety of services, which we shall go through in the next section.
Tools For Every Part Of The Pipeline
PAIR
PAIR or People + AI Guidebook was written to help user experience (UX) professionals and product managers follow a human-centred approach to AI. The PAIR Guidebook focuses on designing human-centred AI, outlining the key questions to ask in product development. For example, if a medical model is to be built to screen individuals for the disease. When the model fails, it may have critical repercussions that both doctors and users need to know about. PAIR guidebooks act as reference points by providing key questions and worksheets as one defines the problem.
TensorFlow Data Validation (TFDV)
A dataset should be considered a product in its own right, and it is important to understand what the dataset represents and the gaps that may have occurred during the collection process. TFDV analyses the dataset and slices it across different features to understand how the data is distributed. TFDV combines tools such as TFX and Facets Overview to help quickly understand the distribution of values across the features in the dataset. This eliminates the need to create a separate codebase to monitor training and production pipelines for skewness.
Data Cards
The analysis generated by TFDV can be used to create Data Cards for the datasets when appropriate. TensorFlow defines a Data Card as a transparency report for the dataset that provides insight into the collection, processing, and usage practices.
TensorFlow Federated
Developers can now apply federated learning to their own models by using the TensorFlow Federated library. Federated learning enables many devices or clients to jointly train machine learning models while keeping their data local. Keeping the data local provides benefits around privacy, and helps protect against risks of centralized data collection, like theft or large-scale misuse.
TensorFlow Privacy
TensorFlow Privacy provides a set of optimizers that enable one to train with differential privacy, from the start.
TF Constrained Optimization and Lattice
The TFCO and TensorFlow Lattice are libraries that help developers to address fairness. In addition to building in privacy considerations when training a model, there may be a set of metrics that one wants to create for more equitable experiences across different groups. This is where TFCO and Lattice libraries can come in handy.
Fairness Indicators
Fairness Indicators enables evaluation of common fairness metrics for classification and regression models on extremely large datasets. Fairness Indicators is built on top of TensorFlow Model Analysis (TFMA), which contains a broader set of metrics for evaluating common metrics across concerns. Fairness Indicators can now be used with pandas to enable evaluations over more datasets and data types.
What-If Tool
For further evaluation of the mode, What-If Tool (WIT) can be used, which can help deepen analysis on a specific segment of data by inspecting the model predictions at the datapoint level. The tool offers a large range of features, from testing hypothetical situations on datapoint, such as “what if this data point was from a different category?”, to visualising the importance of different data features to model’s prediction.
TensorBoard
Visualisations are available via TensorBoard platform, which is used to track training metrics.
ML Metadata
ML Metadata (MLMD) helps generate trackable artefacts throughout the development process. From training data ingestion and any metadata to exporting models with evaluation metrics, the MLMD API can create a trace of all the intermediate components of the ML workflow.
Model Cards
A Model Card is a document to communicate the values and limitations of the model. Model Cards enable developers, policymakers, and users to understand aspects of trained models, contributing to the larger developer ecosystem with added clarity and explainability.
Know more here.