# Complete Guide to DSNE: A Velocity Visualization Tool

DSNE handles high-dimensional velocity visualization problems such as cell differentiation and embryo development with greater accuracy

Handling high-dimensional data is a non-trivial task in Exploratory Data Analysis. Numerous tools ranging from the famous Principal Component Analysis (PCA) to the Stochastic Neighbour Embedding (SNE) variants focus on embedding the high-dimensional data on a low-dimensional visualization, especially in a two-or three-dimensional representation. The vanilla Stochastic Neighbor Embedding converts the high-dimensional Euclidean distance from a point to its K-nearest neighbors into conditional probabilities that represent similarities. Stochastic Neighbour Embedding variants are developed for task-specific applications, including t-SNE, Hierarchical-SNE, RNA-velocity techniques, and UMAP.

Biological science often deals with high-dimensional velocity data such as embryo development, cell fluid transfer, cell-type transition, cell differentiation, and mRNA velocity. Recent work by Volker Bergen, et al., the scVelo has introduced a powerful velocity visualization in a transient cell system. However, the method relies on an intuitive approach rather than a solid mathematical approach. This method yields unreliable results for highly dynamic and unpredictable cell systems such as messenger-RNA (mRNA) velocity visualization.

Songting Shi of the Peking University, China, has developed a mathematically sound approach to visualize cell components’ velocity. This method is named as DSNE, the abbreviation for Directional Stochastic Neighbor Embedding. It can be viewed as a variant of the vanilla SNE proposed to handle velocity visualization problems such as cell differentiation and embryo development.

#### THE BELAMY

DSNE converts the high-dimensional Euclidean distance between the unit-length velocity and the unit-length direction from the point of interest to the nearest neighbor points into a conditional probability distribution. This conditional probability distribution represents the similarity between the velocity and the direction of cell particles.

Mathematically, the conditional probability can be expressed as:

Similarly, the conditional probability distribution for the dimensionally-reduced low-dimensional counterparts can be mathematically expressed as:

An entropy loss function (the Kullback-Leibler Divergence loss) is incorporated to measure the impurity between the high-dimensional representations and the low-dimensional representations. DSNE is optimized during training by reducing this loss function. The loss function can be expressed mathematically as:

## Pancreas Visualization using DSNE

Pancreas is one of the sensitive and vital organs in the human body. Pancreas looks after food digestion and blood sugar regulation. Pancreas cells possess highly dynamic functionality inter- and intra-cells. Timely study of Pancreas leads to better medication in case of impairment. Pancreas cell velocity can be analyzed and visualized using DSNE’s python package.

1. DSNE is now available as a PyPi package. We can simply pip install it.
`!pip install dsne`
1. Create the development environment by importing the necessary modules
``` import numpy as np
import scvelo as scv
from scipy.sparse import issparse
from dsne import DSNE, DSNE_approximate ```
1. Configure the visualization settings to suit the application and download the built-in pancreas dataset.
``` scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True  # set max width size for presenter view
scv.settings.set_figure_params('scvelo')  # for beautified visualization

Output:

1. Define a helper function to obtain the velocity components that are required to plot pancreas velocity.
``` def get_X_V_Y(adata,vkey="velocity",
xkey="Ms",
basis=None,
gene_subset=None,
):
if gene_subset is not None:
subset &= var_names_subset if len(var_names_subset) > 0 else gene_subset
xkey = xkey if xkey in adata.layers.keys() else "spliced"
basis = 'umap' if basis is None else basis
X = np.array(
)
V = np.array(
)
# V -= np.nanmean(V, axis=1)[:, None]
Y =np.array(
)
nans = np.isnan(np.sum(V, axis=0))
if np.any(nans):
X = X[:, ~nans]
V = V[:, ~nans]
return X,V,Y ```
1. Get the two-dimensional velocity embedding of high-dimensional pancreas velocity data with standard DSNE.
``` X,V,X_2d = get_X_V_Y(adata,vkey="velocity",xkey="Ms",basis="umap")
V_2d = DSNE(X, V, Y=X_2d,
perplexity=3.0,
K=16,
threshold_V=1e-8,
separate_threshold=1e-8,
max_iter=600,
mom_switch_iter=250,
momentum=0.5,
final_momentum=0.8,
eta=0.1,
epsilon_kl=1e-16,
epsilon_dsne=1e-16,
seed=6,
random_state=None,
copy_data=False,
with_norm=True,
verbose=True) ```
1. Plot the standard DSNE velocity data in a two-dimensional plot.
``` adata.obsm["X_DSNE"] = X_2d
title ="DSNE"

Output:

1. Plot the compute-efficient DSNE approximate version of the above plots.
``` title ="DSNE-approximate"
V_2d = DSNE_approximate(X, V, Y=X_2d,
perplexity=3.0,
K=16,
threshold_V=1e-8,
separate_threshold=1e-8,
seed=6,
random_state=None,
copy_data=False,
with_norm=True,
verbose=True)

Output:

## Comparison of DSNE with scVelo on Exact Simulation

1. Import the necessary packages and data; and configure hyper-parameters.
``` import os
import numpy as np
import scvelo as scv
from dsne import DSNE, DSNE_approximate
N=500
D=300
d=2
K=16
perplexity =6
n_rep=1
exact = False
with_norm = True
basis = 'exact_embeddings'
verbose = False ```
1. Define helper functions for determining unit length and velocity accuracy.
``` def unitLength(V):
V_ = V/np.sqrt(np.sum(V*V,axis=1,keepdims=True))
return V_
def velocity_accuracy(V, V_exact):
V_unit = unitLength(V)
V_exact_unit = unitLength(V_exact)
accu = np.sum( V_unit* V_exact_unit )/(V.shape[0]*1.)
return accu ```
1. Define a function to preprocess and simulate the built-in data.
``` def simulate_data(N=50, D=3, d=2, save =True, file_name_prefix ="./data" ):
if not os.path.exists(file_name_prefix):
print("Directory: {} do not exist, create it! \n".format(os.path.abspath(file_name_prefix)))
os.makedirs(os.path.abspath(file_name_prefix))
V_2d = np.random.randn(*(N * 3, d)) * 6
err_2d = np.random.randn(*(N * 3, d))*2
x_1 = np.asarray([0, ] * d)
x_2 = np.asarray([50, ] * d)
x_3 = np.asarray([160, ] * d)
X_2d = np.zeros_like(V_2d)
X_2d[0, :] = x_1
X_2d[N, :] = x_2
X_2d[N * 2, :] = x_3
for i in np.arange(N - 1):
X_2d[i + 1, :] = X_2d[i, :] + V_2d[i, :] + err_2d[i,:]
X_2d[i + N + 1, :] = X_2d[i + N, :] + V_2d[i + N, :] + err_2d[i + N, :]
X_2d[i + N * 2 + 1, :] = X_2d[i + N * 2, :] + V_2d[i + N * 2, :] +  err_2d[i + N * 2, :]
y = np.asarray([0, ] * N + [1, ] * N + [2, ] * N)
U = np.array(np.random.randn(*(d, D)))
X = X_2d.__matmul__(U)
V = V_2d.__matmul__(U)
adata = AnnData(X=X, layers={"velocity": V},obs={"clusters": y}, obsm={"X_exact_embeddings":X_2d, "V_exact_embeddings":V_2d})
if save:
file_name = file_name_prefix+"simulated_data_N_{}_D_{}.h5hd".format(N,D)
1. Process the data for simulation and prepare it for two-dimensional plotting.
``` adata = simulate_data(N=N,D=D,d=d,save=False)
X_basis = f"X_{basis}"
X = np.asarray(X, dtype=np.float64)
V = np.asarray(V, dtype=np.float64)
Y = None
if Y is None:
print("Do not get the low dimesnional embedding Y! \n")
# raise
Y = np.asarray(Y, dtype=np.float64) ```
1. Plot the simulation with the recent model, scVelo and determine the accuracy.
``` adata_tmp = AnnData(X=X, obsm={"X_umap": Y}, layers={"velocity": V, "spliced": X})
vkey = "velocity_scvelo_original"
str_exact = "exact" if exact else "approx"
method = 'scvelo_velocity_original'
accu = velocity_accuracy(W, W_exact)
print(f"  {method}, {str_exact},  accu: {accu}\n")
method_str = "scVelo"
title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)

Output:

1. Plot the exact simulation visualization with DSNE-Approximate version and determine the accuracy.
``` W = DSNE_approximate(X, V, Y=Y,
perplexity=perplexity,
pca_d=None,
threshold_V=1e-8,
separate_threshold=1e-8,
seed=16,
random_state=None,
copy_data=False,
verbose=verbose)
vkey = "velocity_scvelo"
str_exact = "exact" if exact else "approx"
method = "DSNE_approximate"
accu = velocity_accuracy(W, W_exact)
print(f"  {method}, {str_exact},  accu: {accu}\n")
method_str = "DSNE-approximate"
title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)

Output:

1. Plot the exact simulation visualization with standard DSNE and determine the accuracy.
``` W = DSNE(X, V, Y=Y,
K= K,
perplexity=perplexity,
pca_d=None,
threshold_V=1e-8,
separate_threshold=1e-8,
max_iter=1000,
mom_switch_iter=250,
momentum=0.5,
final_momentum=0.8,
eta=0.1,
epsilon_kl=1e-16,
epsilon_dsne=1e-16,
with_norm=with_norm,
seed=16,
random_state=None,
copy_data=True,
verbose=verbose)
vkey = "velocity_dsne"
method = 'DSNE'
str_exact = "exact" if exact else "approx"
accu = velocity_accuracy(W, W_exact)
print(f"  {method}, {str_exact},  accu: {accu}\n")
method_str = "DSNE"
title = "{} on exact embeddings with accuracy {:5.3f}".format(method_str, accu)

Output:

Find the Colab Notebook with these code implementations here.

## Wrapping up

Directional Stochastic Neighbor Embedding (DSNE) outperforms the recent successful methods including scVelo in velocity visualizations. DSNE achieves extraordinary results in cell biology applications such as messenger RNA velocity analysis, embryo development, cell transition and cell differentiation with accuracies close to unity!

Original research paper

Github Repository

## More Great AIM Stories

### 8 Fullstack Frameworks Every Developer Must Master

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

## Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

### Whatsapp group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals