Guide to Pykg2vec: A Python Library for Knowledge Graph Embedding

Pykg2vec is a robust and powerful Python library for Knowledge Graph Embedding to represent Entity Relationships in different ML domains
pykg2vec cover image

Knowledge Graph is an ER-based (Entity-Relationship) feature representation learning approach that finds applications in various domains such as natural language processing, medical sciences, finance and e-commerce. Knowledge Graph evolves as a dense graphical network where entities of the data form the nodes and relations form the connections between those nodes. As the data size grows in a large scale, a Knowledge Graph becomes very dense and high-dimensional, demanding powerful computational resources. This issue was alleviated by introducing Knowledge Graph Embedding (KGE), which maps the high-dimensional representation into a compute-efficient low-dimensional embedded representation. 

Many recent researches have concentrated on Knowledge Graph Embedding, and thus powerful task-focused methods have been developed. Some generalized platforms such as PyKEEN, OpenKE and AmpliGraph are introduced as libraries that support KGE models and datasets. Research and other deployment needs can be fulfilled directly using these open source libraries. These libraries make the source code readily available, enable adapting the source code to the custom dataset, help correctly parameterize the models, and compare one method against another.

The available open-source KGE libraries impose specific preset hyper-parameters that do not match for all models. Rather, they work for specific algorithms, dataset pipelines and benchmarks. For new datasets, these libraries mostly fail to discover the golden hyper-parameters on their own, forcing the user to try different predefined hyper-parameters to determine the right ones. These drawbacks question the generalizability of these libraries while there presents a high demand for the generalization.

Shih-Yuan Yu, Sujit Rokka Chhetri and Mohammad Abdullah Al Faruque of the University of California-Irvine, Arquimedes Canedo of the Siemens Corporate Technology, and Palash Goyal of the University of Southern California have introduced a robust and powerful library for Knowledge Graph Embedding, named Pykg2vec. This library overcomes previous libraries’ difficulties and provides a versatile and generalized platform for different research and other deployments.

Comparison of features in Pykg2vec with that in other libraries (Source

How Pykg2vec works

The facts in a Knowledge Graph are represented in triplets in the form of (h, r, t), where h is the head entity, t is the tail entity, and r is the relation between those entities. Knowledge Graph Embeddings learns a function that maps these high-dimensional facts into low-dimensional vectors by preserving the original high-dimensional features’ quality. The original facts are usually termed the positive triplets. A few of these triplets are sampled; either their heads (?, r, t) or tails (h, r, ?) are corrupted and termed the negative triplets. The KGE model is trained to award rewards for positive triplets and penalties for negative triplets. Loss functions such as binary cross-entropy loss or logistic loss are used in this model to find the corrupted entity or to check whether a given triplet is positive or negative. 

Pykg2vec architecture
The Pykg2vec Architecture (Source)

The library discovers the golden hyper-parameters suitable for the model-dataset pair on its own. This is termed the Golden Setting. Users can customize these settings too. This library incorporates Bayesian Optimizer to perform the hyper-parameters discovery.

Python Implementation of Pykg2vec

Pykg2vec is built using Python on top of the PyTorch framework. Nevertheless, it supports TensorFlow implementation also. Official codes are provided for both the PyTorch version and the TensorFlow version.

  1. Install Anaconda-3 distribution in the local machine.


  1. Create a development environment named Pykg2vec by providing the following commands successively.

and, inside the base activation command mode, provide:

conda create --name pykg2vec python=3.6

A part of Code and Output:

  1. Activate the development environment.
conda activate pykg2vec
  1. If the local machine is enabled with a GPU runtime and CUDA 10.1, the following command installs the compatible PyTorch version and its dependencies.
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

On the other hand, if the local machine is enabled only with CPU, the following command may be of help.

conda install pytorch torchvision cpuonly -c pytorch
  1. Set up the library by cloning the source code from GitHub.
git clone

Code and Output:

  1. Change the directory to /content/pykg2vec/ to proceed further with the source files.
cd pykg2vec
  1. Install the package using the following command.
python install

A part of Code and Output:

  1. Check for tunable parameters using the command,
pykg2vec-train -h
  1. Train the TransE model on the FB15k benchmark dataset to sample the performance.
 cd /content/pykg2vec/examples/
 pykg2vec-train -mn TransE 

A part of Code and Output:


It should be noted that training takes around 2 hours to complete in a CPU runtime. Users may opt for a GPU runtime for quick training and inference.

  1. Make inference on the fully trained TransE model using the following command.
pykg2vec-infer -mn TransE

Wrapping up

Pykg2vec is a versatile Python library for training, testing, experimenting, researching and educating the models, datasets and configurations related to the Knowledge Graph Embedding. Pykg2vec presently supports 25 state-of-the-art KGE models: SLM, ConvE, Complex, RotatE, CP, TuckER, SME, DistMult, NTN, ConvKB, TransE, TransH, TransR, TransD, TransM, KB2E, MuRP, InteractE, OctonionE, RESCAL, Analogy, ProjE, SimplE, HypER and QuatE.

Models supported by pykg2vec
State-of-the-art models supported by the Pykg2vec library (source)

Pykg2vec library outshines the present KGE libraries such as AmpliGraph, PyKEEN and OpenKE in the number of models, the number of datasets and the way of discovering and setting the hyper-parameters.

Further reading:

Download our Mobile App

Rajkumar Lakshmanamoorthy
A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring