MITB Banner

Complete Guide to DSNE: A Velocity Visualization Tool

DSNE handles high-dimensional velocity visualization problems such as cell differentiation and embryo development with greater accuracy

Share

DSNE velocity visualization

Illustration by Workplace with notebook on black background

Handling high-dimensional data is a non-trivial task in Exploratory Data Analysis. Numerous tools ranging from the famous Principal Component Analysis (PCA) to the Stochastic Neighbour Embedding (SNE) variants focus on embedding the high-dimensional data on a low-dimensional visualization, especially in a two-or three-dimensional representation. The vanilla Stochastic Neighbor Embedding converts the high-dimensional Euclidean distance from a point to its K-nearest neighbors into conditional probabilities that represent similarities. Stochastic Neighbour Embedding variants are developed for task-specific applications, including t-SNE, Hierarchical-SNE, RNA-velocity techniques, and UMAP. 

Biological science often deals with high-dimensional velocity data such as embryo development, cell fluid transfer, cell-type transition, cell differentiation, and mRNA velocity. Recent work by Volker Bergen, et al., the scVelo has introduced a powerful velocity visualization in a transient cell system. However, the method relies on an intuitive approach rather than a solid mathematical approach. This method yields unreliable results for highly dynamic and unpredictable cell systems such as messenger-RNA (mRNA) velocity visualization.

Songting Shi of the Peking University, China, has developed a mathematically sound approach to visualize cell components’ velocity. This method is named as DSNE, the abbreviation for Directional Stochastic Neighbor Embedding. It can be viewed as a variant of the vanilla SNE proposed to handle velocity visualization problems such as cell differentiation and embryo development.

DSNE converts the high-dimensional Euclidean distance between the unit-length velocity and the unit-length direction from the point of interest to the nearest neighbor points into a conditional probability distribution. This conditional probability distribution represents the similarity between the velocity and the direction of cell particles. 

Mathematically, the conditional probability can be expressed as:

DSNE high dimensional representation

Similarly, the conditional probability distribution for the dimensionally-reduced low-dimensional counterparts can be mathematically expressed as:

DSNE low dimensional representation

An entropy loss function (the Kullback-Leibler Divergence loss) is incorporated to measure the impurity between the high-dimensional representations and the low-dimensional representations. DSNE is optimized during training by reducing this loss function. The loss function can be expressed mathematically as:

DSNE loss function

Pancreas Visualization using DSNE 

Pancreas is one of the sensitive and vital organs in the human body. Pancreas looks after food digestion and blood sugar regulation. Pancreas cells possess highly dynamic functionality inter- and intra-cells. Timely study of Pancreas leads to better medication in case of impairment. Pancreas cell velocity can be analyzed and visualized using DSNE’s python package. 

  1. DSNE is now available as a PyPi package. We can simply pip install it.
!pip install dsne
  1. Create the development environment by importing the necessary modules
 import numpy as np
 import scvelo as scv
 from scipy.sparse import issparse
 from dsne import DSNE, DSNE_approximate 
  1. Configure the visualization settings to suit the application and download the built-in pancreas dataset.
 scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
 scv.settings.presenter_view = True  # set max width size for presenter view
 scv.settings.set_figure_params('scvelo')  # for beautified visualization
 adata = scv.datasets.pancreas()
 scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
 scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
 scv.tl.velocity(adata) 

Output:

  1. Define a helper function to obtain the velocity components that are required to plot pancreas velocity.
 def get_X_V_Y(adata,vkey="velocity",
               xkey="Ms",
               basis=None,
               gene_subset=None,
               ):
     subset = np.ones(adata.n_vars, bool)
     if gene_subset is not None:
         var_names_subset = adata.var_names.isin(gene_subset)
         subset &= var_names_subset if len(var_names_subset) > 0 else gene_subset
     elif f"{vkey}_genes" in adata.var.keys():
         subset &= np.array(adata.var[f"{vkey}_genes"].values, dtype=bool)
     xkey = xkey if xkey in adata.layers.keys() else "spliced"
     basis = 'umap' if basis is None else basis
     X = np.array(
         adata.layers[xkey].A[:, subset]
         if issparse(adata.layers[xkey])
         else adata.layers[xkey][:, subset]
     )
     V = np.array(
         adata.layers[vkey].A[:, subset]
         if issparse(adata.layers[vkey])
         else adata.layers[vkey][:, subset]
     )
     # V -= np.nanmean(V, axis=1)[:, None]
     Y =np.array(
         adata.obsm[f"X_{basis}"]
     )
     nans = np.isnan(np.sum(V, axis=0))
     if np.any(nans):
         X = X[:, ~nans]
         V = V[:, ~nans]
     return X,V,Y 
  1. Get the two-dimensional velocity embedding of high-dimensional pancreas velocity data with standard DSNE.
 X,V,X_2d = get_X_V_Y(adata,vkey="velocity",xkey="Ms",basis="umap")
 V_2d = DSNE(X, V, Y=X_2d,
             perplexity=3.0,
             K=16,
             threshold_V=1e-8,
             separate_threshold=1e-8,
             max_iter=600,
             mom_switch_iter=250,
             momentum=0.5,
             final_momentum=0.8,
             eta=0.1,
             epsilon_kl=1e-16,
             epsilon_dsne=1e-16,
             seed=6,
             random_state=None,
             copy_data=False,
             with_norm=True,
             verbose=True) 
  1. Plot the standard DSNE velocity data in a two-dimensional plot.
 adata.obsm["X_DSNE"] = X_2d
 adata.obsm["V_DSNE"] = V_2d
 title ="DSNE"
 scv.pl.velocity_embedding_stream(adata, title=title+' stream', basis='umap',V=adata.obsm["V_DSNE"], smooth=0.5,density=2,)
 scv.pl.velocity_embedding_grid(adata, title=title+' grid' , basis='umap',V=adata.obsm["V_DSNE"], smooth=0.5,density=2,)
 scv.pl.velocity_embedding(adata,  title=title+' embedding',basis='umap',V = adata.obsm["V_DSNE"]) 

Output: 

Pancreas DSNE stream
Pancreas DSNE grid
Pancreas DSNE embedding
  1. Plot the compute-efficient DSNE approximate version of the above plots.
 title ="DSNE-approximate"
 V_2d = DSNE_approximate(X, V, Y=X_2d,
                         perplexity=3.0,
                         K=16,
                         threshold_V=1e-8,
                         separate_threshold=1e-8,
                         seed=6,
                         random_state=None,
                         copy_data=False,
                         with_norm=True,
                         verbose=True)
 adata.obsm["X_DSNE_approximate"] = X_2d
 adata.obsm["V_DSNE_approximate"] = V_2d
 scv.pl.velocity_embedding_stream(adata, basis='umap',V=adata.obsm["V_DSNE_approximate"],  title=title+' stream', smooth=0.5,density=2,)
 scv.pl.velocity_embedding_grid(adata, basis='umap',V=adata.obsm["V_DSNE_approximate"],  title=title+' grid', smooth=0.5,density=2,)
 scv.pl.velocity_embedding(adata, basis='umap',V = adata.obsm["V_DSNE_approximate"], title=title+' embedding') 

Output:

Pancreas DSNE approx stream
Pancreas DSNE approx grid
Pancreas DSNE approx embedding

Comparison of DSNE with scVelo on Exact Simulation

  1. Import the necessary packages and data; and configure hyper-parameters.
 import os
 import numpy as np
 import scvelo as scv
 from anndata import AnnData, read_h5ad
 from dsne import DSNE, DSNE_approximate
 N=500
 D=300
 d=2
 K=16
 perplexity =6
 n_rep=1
 exact = False
 with_norm = True
 basis = 'exact_embeddings'
 verbose = False 
  1. Define helper functions for determining unit length and velocity accuracy.
 def unitLength(V):
     V_ = V/np.sqrt(np.sum(V*V,axis=1,keepdims=True))
     return V_
 def velocity_accuracy(V, V_exact):
     V_unit = unitLength(V)
     V_exact_unit = unitLength(V_exact)
     accu = np.sum( V_unit* V_exact_unit )/(V.shape[0]*1.)
     return accu 
  1. Define a function to preprocess and simulate the built-in data.
 def simulate_data(N=50, D=3, d=2, save =True, file_name_prefix ="./data" ):
     if not os.path.exists(file_name_prefix):
         print("Directory: {} do not exist, create it! \n".format(os.path.abspath(file_name_prefix)))
         os.makedirs(os.path.abspath(file_name_prefix))
     V_2d = np.random.randn(*(N * 3, d)) * 6
     err_2d = np.random.randn(*(N * 3, d))*2
     x_1 = np.asarray([0, ] * d)
     x_2 = np.asarray([50, ] * d)
     x_3 = np.asarray([160, ] * d)
     X_2d = np.zeros_like(V_2d)
     X_2d[0, :] = x_1
     X_2d[N, :] = x_2
     X_2d[N * 2, :] = x_3
     for i in np.arange(N - 1):
         X_2d[i + 1, :] = X_2d[i, :] + V_2d[i, :] + err_2d[i,:]
         X_2d[i + N + 1, :] = X_2d[i + N, :] + V_2d[i + N, :] + err_2d[i + N, :]
         X_2d[i + N * 2 + 1, :] = X_2d[i + N * 2, :] + V_2d[i + N * 2, :] +  err_2d[i + N * 2, :]
     y = np.asarray([0, ] * N + [1, ] * N + [2, ] * N)
     U = np.array(np.random.randn(*(d, D)))
     X = X_2d.__matmul__(U)
     V = V_2d.__matmul__(U)
     adata = AnnData(X=X, layers={"velocity": V},obs={"clusters": y}, obsm={"X_exact_embeddings":X_2d, "V_exact_embeddings":V_2d})
     if save:
         file_name = file_name_prefix+"simulated_data_N_{}_D_{}.h5hd".format(N,D)
         adata.write_h5ad(file_name)
     return adata 
  1. Process the data for simulation and prepare it for two-dimensional plotting.
 adata = simulate_data(N=N,D=D,d=d,save=False)
 X = adata.X
 V = adata.layers["velocity"]
 X_basis = f"X_{basis}"
 X = np.asarray(X, dtype=np.float64)
 V = np.asarray(V, dtype=np.float64)
 Y = None
 if (X_basis in adata.obsm.keys()) and adata.obsm[X_basis] is not None:
   Y = adata.obsm[f"X_{basis}"]
 if Y is None:
   print("Do not get the low dimesnional embedding Y! \n")
   # raise
 Y = np.asarray(Y, dtype=np.float64) 
  1. Plot the simulation with the recent model, scVelo and determine the accuracy.
 adata_tmp = AnnData(X=X, obsm={"X_umap": Y}, layers={"velocity": V, "spliced": X})
 scv.tl.velocity_graph(adata_tmp, xkey='spliced')
 scv.tools.velocity_embedding(adata_tmp, basis="umap")
 W = adata_tmp.obsm["velocity_umap"]
 vkey = "velocity_scvelo_original"
 str_exact = "exact" if exact else "approx"
 method = 'scvelo_velocity_original'
 adata.obsm[f"{vkey}_{str_exact}_{basis}"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  {method}, {str_exact},  accu: {accu}\n")
 method_str = "scVelo"
 title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)
 scv.pl.velocity_embedding(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, basis=basis, V=W, title=title,) 

Output:

  1. Plot the exact simulation visualization with DSNE-Approximate version and determine the accuracy.
 W = DSNE_approximate(X, V, Y=Y,
                                       perplexity=perplexity,
                                       pca_d=None,
                                       threshold_V=1e-8,
                                       separate_threshold=1e-8,
                                       seed=16,
                                       random_state=None,
                                       copy_data=False,
                                       verbose=verbose)
 vkey = "velocity_scvelo"
 str_exact = "exact" if exact else "approx"
 method = "DSNE_approximate"
 adata.obsm[f"{vkey}_{str_exact}_{basis}"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  {method}, {str_exact},  accu: {accu}\n")
 method_str = "DSNE-approximate"
 title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)
 scv.pl.velocity_embedding(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, basis=basis, V=W, title=title,) 

Output:

DSNE approx embeddings
DSNE approx velocity
DSNE approx grid
  1. Plot the exact simulation visualization with standard DSNE and determine the accuracy.
 W = DSNE(X, V, Y=Y,
                  K= K,
                  perplexity=perplexity,
                  pca_d=None,
                  threshold_V=1e-8,
                  separate_threshold=1e-8,
                  max_iter=1000,
                  mom_switch_iter=250,
                  momentum=0.5,
                  final_momentum=0.8,
                  eta=0.1,
                  epsilon_kl=1e-16,
                  epsilon_dsne=1e-16,
                  with_norm=with_norm,
                  seed=16,
                  random_state=None,
                  copy_data=True,
                  verbose=verbose)
 vkey = "velocity_dsne"
 method = 'DSNE'
 str_exact = "exact" if exact else "approx"
 adata.obsm[f"{vkey}_{str_exact}_{basis}"] = W
 W_exact = adata.obsm["V_exact_embeddings"]
 accu = velocity_accuracy(W, W_exact)
 print(f"  {method}, {str_exact},  accu: {accu}\n")
 method_str = "DSNE"
 title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)
 scv.pl.velocity_embedding(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_stream(adata, basis=basis, V=W, title=title,density=2,)
 scv.pl.velocity_embedding_grid(adata, basis=basis, V=W, title=title,) 

Output:

DSNE velocity
DSNE embedding

Find the Colab Notebook with these code implementations here.

Wrapping up

Directional Stochastic Neighbor Embedding (DSNE) outperforms the recent successful methods including scVelo in velocity visualizations. DSNE achieves extraordinary results in cell biology applications such as messenger RNA velocity analysis, embryo development, cell transition and cell differentiation with accuracies close to unity!

Further Reading:

Original research paper

Github Repository

Share
Picture of Rajkumar Lakshmanamoorthy

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.