In a bid to tackle the privacy issues, Google had proposed federated learning in 2017. Federated learning is a distributed machine learning approach for computing machine learning models over data collected by edge devices like smartphones, smartwatches, laptops, etc. Another latest technique includes gossip learning, a decentralised alternative to federated learning.
While these approaches emphasise data privacy, choosing the right frameworks or libraries for securely training your model can become cumbersome. That is where a privacy-focused tool such as PySyft comes into play since libraries such as PyTorch do not come out of the box with the facility to perform federated learning. In simple terms, PySyft is a cover around PyTorch, which adds additional functionality to it.
PyTorch vs PySyft
PyTorch is an open-source machine learning framework that facilitates building deep learning projects. It emphasises flexibility and allows deep learning models to be expressed in idiomatic Python. PyTorch is used naturally like Numpy/ SciPy/ Scikit-learn etc., but with strong GPU acceleration. Moreover, it supports dynamic computation graphs, allowing you to change how the network behaves on the fly, unlike static graphs used in frameworks such as Google’s TensorFlow.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
PySyft, on the other hand, is an open-source framework that enables secure, private computations in deep learning. It decouples private data from model training, using federated learning, differential privacy, homomorphic encryption (HE) and multi-party computation (MPC) within the main deep learning framework like PyTorch, Keras and TensorFlow.
Most software libraries and frameworks let you compute over the information you own and see inside the machines you control. This means that you cannot compute information without first obtaining ownership of that information or the machines. That limits human collaboration and systematically drives the centralisation of data because you cannot work with data without putting it all in one place, say the developers of PySyft.
The Syft ecosystem plans to change this system by allowing developers and researchers to write software that can compute information they do not own on machines, including servers in the cloud, personal desktops, mobile phones, laptops, websites, and edge devices. “Wherever your data wants to live in your ownership, the ‘Syft ecosystem’ exists to help keep it in there while allowing it to be used privately for computation,” said team OpenMined.
How PySyft works?
The principle of PySyft was first published in a research paper titled ‘A generic framework for privacy-preserving deep learning,’ and its first implementation was led by OpenMined, an open-source community that looks to make the world more privacy-preserving by lowering the barriers to entry-to-private artificial intelligence technologies.
The main component of PySyft includes an abstraction called the SyftTensor. It represents a state of transformation of the data and can be chained together. The chain structure at the head is the PyTorch tensor, and the changes of conditions embodied by the SyftTensors are accessed upward using the parent attribute and downward using the child attribute, respectively.
PySyft is relatively simple and almost similar to standard PyTorch, Keras, or TensorFlow. The visuals below illustrate a simple classification model built using PySyft.
Towards data privacy
As we advance, privacy will likely become one of the foundational building blocks of the next generation of deep learning frameworks. PySyft is one of the first attempts to enable privacy models in deep learning. PyTorch, however, is also working on similar lines to execute ML models on edge devices to preserve privacy, reduce latency, and enable new interactive use cases.
For instance, its PyTorch Mobile, which is still in the beta stage, allows developers and researchers to seamlessly go from training a model to deploying it while staying entirely within the PyTorch ecosystem. It offers an end-to-end workflow that simplifies the research to the production environment for mobile devices, alongside paving the way for privacy-preserving features via federated learning techniques.