Now Reading
CrypTen – A Research Tool for Secure and Privacy – Preserving Machine Learning in Pytorch


CrypTen – A Research Tool for Secure and Privacy – Preserving Machine Learning in Pytorch

Aishwarya Verma
CrypTen

Facebook’s Pytorch had created a huge buzz in the market when it was released five years ago. Now, it is not only the most preferred frameworks for Machine Learning and Deep Learning models but also one of the most powerful tools in research to develop new libraries and frameworks(like Huggingface, Fast.ai, etc). One of the most captivating libraries released by Facebook’s AI Research Lab(FAIR) is CrypTen – a tool for secure computation in ML. CrypTen is an open-source Python framework, built on Pytorch, to provide secure and privacy-preserving machine learning.

Crypten serves Secure Multiparty Computation as its secured computing backend and lessens the gap between ML researchers/developers and cryptography by facilitating Pytorch API’s to perform encryption techniques. In secure MPC, the data owner encrypts its data by splitting the data using random masks into n random shares(that can be combined to form original data). These n shares are then distributed between n parties. This process is called secret sharing. The parties can compute functions on the data by operating on the secret shares and can decrypt the final result by communicating the resulting shares amongst each other.

Installation

Install CrypTen library through pip via PyPI.

!pip install crypten

DEMO – Basics of CrypTen

This demo shows the basics of CrypTen libraries like creating encrypted tensors, operation of encrypted tensors, etc.

  1. Import the CrypTen and torch libraries and initiate a new object.
import crypten
import torch
 
crypten.init()
  1. Create an encrypted tensor. CrypTen provides a cryptensor function(similar to torch.tensor) to convert the tensor to encrypted tensor and to decrypt that encrypted tensor, we use get_plain_text().
  • Through Torch Tensor.
# Create torch tensor
x = torch.tensor([1.0, 2.0, 3.0])
 
# Encrypt x
x_enc = crypten.cryptensor(x)
 
# Decrypt x to get original data
x_dec = x_enc.get_plain_text()   
print(x_dec)
  • Through Python list.
# Create python list
y = [4.0, 5.0, 6.0]
 
# Encrypt x
y_enc = crypten.cryptensor(y)
 
# Decrypt x
y_dec = y_enc.get_plain_text()
print(y_dec)
  1. Arithmetic Operation on CryptTensors – Point to be noted that these operations never reveal any information about encrypted tensors and return an encrypted tensor output. An example of addition is shown below:
# Addition
#here x_enc is encrypted tensor
#y is a float value
#y_enc is an encrypted value of y
 
#Addition operation between encrypted tensor and plain_text
z_enc1 = x_enc + y      # Public
#Addition operation between encrypted tensors
z_enc2 = x_enc + y_enc  # Private
print("\nPublic  addition:", z_enc1.get_plain_text())
print("Private addition:", z_enc2.get_plain_text())

You can check all arithmetic operations, here.

  1. Comparison between CrypTensors – Like arithmetic operations, they will also output a CrypTensor. Decrypting these results, CrypTensors will evaluate to 0’s and 1’s corresponding to False and True values respectively. One of the operation is shown below:
# Less than
#comparison between an encrypted tensor and torch tensor
#here, x_enc is encrypted tensor and y is torch tensor, 
#and y_enc is encrypted tensor of y
z_enc1 = x_enc < y      # Public
#comparison between two encrypted tensors
z_enc2 = x_enc < y_enc  # Private
print("\nPublic  (x < y) :", z_enc1.get_plain_text())
print("Private (x < y) :", z_enc2.get_plain_text())

You can check all the comparison operations, here.

  1. Advanced Mathematics – CrypTen provides MPC support for functions like log, square-root, tanh, etc. There is a chance that these functions may fail if the input is not in the given range and this failure can only be checked/detected when the values are decrypted. So. be cautious when dealing with these functions(It is recommended to normalize the data before using these functions).  An example of it is shown below.
# Reciprocal
#x is the torch tensor
#x_enc is the encrypted tensor of x
 
#reciprocal value of torch tensor
z = x.reciprocal()          # Public
#reciprocal value of encrypted tensor
z_enc = x_enc.reciprocal()  # Private
print("\nPublic  reciprocal:", z)
print("Private reciprocal:", z_enc.get_plain_text())

You can check all the operations here.

  1. Control flow using CrypTensors – It is not possible to use cryptensors directly in conditional expressions because boolean expressions can’t be calculated in encrypted form. An example of it shown below. This code will give RuntimeError as we are applying conditions on encrypted tensors.
# Normal Control-flow code will raise an error
#here a and b are two integers 
#and x_enc and y_enc are two cryptensors
try:
    if x_enc < y_enc:
        z = a
    else:
        z = b
except RuntimeError as error:
    print(f"RuntimeError caught: \"{error}\"\n")

    However, some of the mathematical expressions can be evaluated as shown below:

# Instead use a mathematical expression
#here a and b are two integers 
#and x_enc and y_enc are two cryptensors
use_a = (x_enc < y_enc)
z_enc = use_a * a + (1 - use_a) * b
print("z:", z_enc.get_plain_text())

You can check this part, here.

  1. Advanced Indexing: Encrypted tensors can be indexed, concatenated, stacked, reshaped, etc(Check the full list here). An example of it is shown below.
# Indexing
z_enc = x_enc[:-1]
print("Indexing:\n", z_enc.get_plain_text())

You can check all the advanced operations here.

You can check the full demo here.

To know more about the internal working of CrypTensors, you can check this notebook

DEMO – Application of CrypTen in different Scenarios

In this demo, we will demonstrate four important applications of CrypTen by learning a linear SVM model. In all the cases, we will consider a two-party setting.

Initial Setup 

After installing the CrypTen library, cloning the GitHub repository and moving the mnist_utils.py(helper script to download and splitting the data) to the current working directory, we will create a dummy dataset for training a linear SVM to perform binary classification. The dummy dataset contains 100 features with 1000 ground truths. Then we will generate a random hyperplane separating negative and positive values and 100 testing samples so that the model can learn the pattern, not the data. The code snippet is available here. We have named the Two-party setting as Alice and Bob. Since we are using a multiprocess decorator, the rank of Alice will be 0 and of Bob will be 1.

Now, split the features and labels for Alice and Bob and save the datasets for all the different applications which we are going to discuss below. The code for this is available here.

Now that we have generated our dataset, we will train the SVM model in four scenarios, Alice and Bob(two-party setting):

Data Labeling: In this case, Alice has access to features, while Bob has access to labels. We will train our linear SVM by encrypting the features from Alice and labels from Bob, then train the model on encrypted data.

@mpc.run_multiprocess(world_size=2)
def data_labeling_example():
    """Apply data labeling access control model"""
    # In order to indicate the source of a given encrypted tensor,
    # we encrypt our tensor using crypten.load() (from a file) or crypten.cryptensor() (from a tensor) using a keyword argument src. 
    # This src argument takes the rank of the party we want to encrypt from (recall that ALICE is 0 and BOB is 1).
    # # Alice loads features, Bob loads labels
    features_enc = crypten.load(filenames["features"], src=ALICE)
    labels_enc = crypten.load(filenames["labels"], src=BOB)
    
    # Execute training
    w, b = train_linear_svm(features_enc, labels_enc, epochs=epochs, lr=lr)
    
    # Evaluate model
    evaluate_linear_svm(test_features, test_labels, w, b)

Feature Aggregation: Alice has access to the first 50 features, while Bob has access to the last 50 features. In this case, we will train linear SVM on the combined data keeping the data private to each other. So, we will concatenate the feature dataset from Alice and Bob and pass it to linear SVM.

@mpc.run_multiprocess(world_size=2)
def feature_aggregation_example():
    """Apply feature aggregation access control model"""
    # Alice loads some features, Bob loads other features
     #we encrypt our tensor using crypten.load() (from a file) 
    features_alice_enc = crypten.load(filenames["features_alice"], src=ALICE)
    features_bob_enc = crypten.load(filenames["features_bob"], src=BOB)
    
    # Concatenate features
    features_enc = crypten.cat([features_alice_enc, features_bob_enc], dim=0)
    
    # Encrypt labels
    labels_enc = crypten.cryptensor(labels)
    
    # Execute training
    w, b = train_linear_svm(features_enc, labels_enc, epochs=epochs, lr=lr)
    
    # Evaluate model
    evaluate_linear_svm(test_features, test_labels, w, b)

Data Augmentation: Alice has access to the first 500 examples, while Bob has access to the last 500 examples. This scenario can occur in applications where several parties may each have access to a small amount of sensitive data, where no individual party has enough data to train an accurate model. The main difference from the Feature Aggregation is that we are concatenating over the other dimension (the sample dimension rather than the feature dimension).

@mpc.run_multiprocess(world_size=2)
def dataset_augmentation_example():
    """Apply dataset augmentation access control model""" 
    # Alice loads some samples, Bob loads other samples
    samples_alice_enc = crypten.load(filenames["samples_alice"], src=ALICE)
    samples_bob_enc = crypten.load(filenames["samples_bob"], src=BOB)
    
    # Concatenate features over samples
    samples_enc = crypten.cat([samples_alice_enc, samples_bob_enc], dim=1)
    
    labels_enc = crypten.cryptensor(labels)
    
    # Execute training
    w, b = train_linear_svm(samples_enc, labels_enc, epochs=epochs, lr=lr)
    
    # Evaluate model
    evaluate_linear_svm(test_features, test_labels, w, b)

Model Hiding: Alice has access to w_true and b_true, while Bob has access to data samples to be classified. Here, Alice has a pre-trained model that cannot be revealed, while Bob would like to use this model to evaluate on private data sample(s). Hence, we will do this encrypting the weights and biases from Alice model and test features from Bob.

@mpc.run_multiprocess(world_size=2)
def model_hiding_example():
    """Apply model hiding access control model"""
    # Alice loads the model
    w_true_enc = crypten.load(filenames["w_true"], src=ALICE)
    b_true_enc = crypten.load(filenames["b_true"], src=ALICE)
    
    # Bob loads the features to be evaluated
    test_features_enc = crypten.load(filenames["test_features"], src=BOB)
    
    # Evaluate model
    evaluate_linear_svm(test_features_enc, test_labels, w_true_enc, b_true_enc)

You can check the full demo here.

DEMO – Classification with Encrypted Neural Network

In this section, we are going to discuss Model Hiding in detail. Alice has access to a pretrained model which she wants to keep private, while Bob has access to data samples to be classified which he wants to keep private. In the implementation part, we are going to create a MultiLayer Network(MLP) for the classification of MNIST digit dataset. This pre-trained model belongs to Alice and then we will encrypt the pre-trained model and dataset from bob for the required classification.

Initial Setup

Import all the required libraries and packages and initialize the CrypTen object. Split the dataset into Alice and Bob portions with the help of a helper script called mnist_utils.py. The code for copying this helper script is mentioned here and the code for importing the crypTen and torch is available here. Then create a neural network architecture and a function to calculate the accuracy. The code snippet of neural network architecture is available here. A simple pipeline of this demo is shown below:

Source : https://crypten.ai/

Encrypting Pre-trained Model

In this step, initialize the rank for Alice and Bob and encrypt the pre-trained model of Alice’s. The code for it, is shown below.

# Load pre-trained model to Alice
dummy_model = AliceNet()
#load a PyTorch model from file to the appropriate source, convert it to a CrypTen model and then encrypt it.
plaintext_model = crypten.load('/content/CrypTen/tutorials/models/tutorial4_alice_model.pth', dummy_model=dummy_model, src=ALICE)
 
# Encrypt the model from Alice:    
 
# 1. Create a dummy input with the same shape as the model input
dummy_input = torch.empty((1, 784))
 
# 2. Construct a CrypTen network with the trained model and dummy_input
##from_pytorch takes the plain model and dummy input as arguments
#dummy imput dimensions must be same as of models input
private_model = crypten.nn.from_pytorch(plaintext_model, dummy_input)
 
# 3. Encrypt the CrypTen network with src=ALICE
private_model.encrypt(src=ALICE)
 
#Check that model is encrypted:
print("Model successfully encrypted:", private_model.encrypted)

Classify Encrypted data(of Bob) with the encrypted model(of Alice’s)

For this step, encrypt the bob’s dataset as we did in the tutorial above – section Model Hiding. The code for this step is the same as we discussed above and is available here.

Validating Encrypted Model

For the last step, we will check the encrypted output from the above step whether it has some meaningful labels or not. You can check the code snippet and output, here.

In the same way, we can create complex encrypted models like LesNet, ResNet, etc.

See Also
Trax

You can check the full demo here.

DEMO – Training an Encrypted Neural Network

In this section, we are going to set up a training of an encrypted neural network via CrypTen. We are going to implement a classification model and the dataset we are going to use is the MNIST dataset. We will divide 28 features of this dataset into 2 parts and then combine their encrypted features to train a neural network while keeping the two parts of datasets private. For example the feature size of each example in the MNIST data is 28 x 28. Let’s assume Alice has the first 28 x 20 features and Bob has the last 28 x 8 features. One way to think of this split is that Alice has the (roughly) top 2/3rds of each image, while Bob has the bottom 1/3rd of each image.

Initial Setup

Import all the required libraries and packages and initialize the CrypTen object. For tutorial purposes, we are going to implement a small network of binary classification(i.e., zero and non-zero classification model) and we will take a small dataset of 100 examples with the help of helper script mnist_utils.py. The code for copying this helper script is mentioned here and the code for importing the crypTen and torch is available here. Then create your own neural network architecture. The code snippet of neural network architecture is available here.

Encrypted Training 

There are mainly two differences between Pytorch training and CrypTen training.

(1) Use one-hot encoding: CrypTen training requires all labels to use a one-hot encoding. This means that when using standard datasets such as MNIST, we need to modify the labels to use a one-hot encoding.

(2) Directly update parameters: CrypTen does not use the PyTorch optimizers. Instead, CrypTen implements encrypted SGD by implementing its own backward function, followed by directly updating the parameters. We now show some small examples to illustrate these differences.

 As before, we will assume Alice has the rank 0 process and Bob has the rank 1 process. Then load the training dataset of Alice.

# Load Alice's data 
data_alice_enc = crypten.load('/tmp/alice_train.pth', src=ALICE)

For illustration purposes, we are creating a dummy dataset of the same size as original i.e., feature dataset of size 100 X 1 X 28 X 28  along with label dataset of size 100 X 1. Then converting all the labels into one-hot encoding. After transforming all data (features and labels) to crypTensors, initiate the training model and convert the model to crypTensors via crypten.nn.from_pytorch. At last, encrypt the model by model.encrypt(). The code shown below does this process.

# We'll now set up the data for our small example below
# For illustration purposes, we will create dummy data
# and encrypt all of it from source ALICE
 
#Returns a tensor filled with random numbers from a uniform distribution on the interval [0,1)
x_small = torch.rand(100, 1, 28, 28)
y_small = torch.randint(1, (100,))
# Transform labels into one-hot encoding
label_eye = torch.eye(2)
y_one_hot = label_eye[y_small]
 
# Transform all data to CrypTensors
x_train = crypten.cryptensor(x_small, src=ALICE)
y_train = crypten.cryptensor(y_one_hot)
 
# Instantiate and encrypt a CrypTen model
model_plaintext = ExampleNet()
dummy_input = torch.empty(1, 1, 28, 28)
#from_pytorch takes the plain model and dummy input as arguments
#dummy imput dimensions must be same as of models input
model = crypten.nn.from_pytorch(model_plaintext, dummy_input)
model.encrypt()

Now, we will train the model and calculate the loss, this is the same as we do in Pytorch training. The code snippet is available here.

Complete Training Example

The methodology is the same as we discussed for dummy data, just a slight variation to combine both the encrypted datasets(Alice’s and Bob’s). This step is done before encrypting the model. The code is shown below:

  # Load data:
    x_alice_enc = crypten.load('/tmp/alice_train.pth', src=ALICE)
    x_bob_enc = crypten.load('/tmp/bob_train.pth', src=BOB)
    
    # Combine the feature sets: identical to Tutorial 3
    x_combined_enc = crypten.cat([x_alice_enc, x_bob_enc], dim=2)
    
    # Reshape to match the network architecture
    x_combined_enc = x_combined_enc.unsqueeze(1)
    
    # Initialize a plaintext model and convert to CrypTen model
    model = crypten.nn.from_pytorch(ExampleNet(), dummy_input)
    model.encrypt()

Rest of the steps for training are the same as mentioned in the example of a dummy dataset. The code snippet for it is available here.

You can check the full demo, here.

EndNotes

In this article, we have discussed FAIR’s CrypTen framework and its demo as following:

Resources used above are:

What Do You Think?

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.
Join Our Telegram Group. Be part of an engaging online community. Join Here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top