MITB Banner

Guide to Pykg2vec: A Python Library for Knowledge Graph Embedding

Pykg2vec is a robust and powerful Python library for Knowledge Graph Embedding to represent Entity Relationships in different ML domains

Share

pykg2vec cover image

Knowledge Graph is an ER-based (Entity-Relationship) feature representation learning approach that finds applications in various domains such as natural language processing, medical sciences, finance and e-commerce. Knowledge Graph evolves as a dense graphical network where entities of the data form the nodes and relations form the connections between those nodes. As the data size grows in a large scale, a Knowledge Graph becomes very dense and high-dimensional, demanding powerful computational resources. This issue was alleviated by introducing Knowledge Graph Embedding (KGE), which maps the high-dimensional representation into a compute-efficient low-dimensional embedded representation. 

Many recent researches have concentrated on Knowledge Graph Embedding, and thus powerful task-focused methods have been developed. Some generalized platforms such as PyKEEN, OpenKE and AmpliGraph are introduced as libraries that support KGE models and datasets. Research and other deployment needs can be fulfilled directly using these open source libraries. These libraries make the source code readily available, enable adapting the source code to the custom dataset, help correctly parameterize the models, and compare one method against another.

The available open-source KGE libraries impose specific preset hyper-parameters that do not match for all models. Rather, they work for specific algorithms, dataset pipelines and benchmarks. For new datasets, these libraries mostly fail to discover the golden hyper-parameters on their own, forcing the user to try different predefined hyper-parameters to determine the right ones. These drawbacks question the generalizability of these libraries while there presents a high demand for the generalization.

Shih-Yuan Yu, Sujit Rokka Chhetri and Mohammad Abdullah Al Faruque of the University of California-Irvine, Arquimedes Canedo of the Siemens Corporate Technology, and Palash Goyal of the University of Southern California have introduced a robust and powerful library for Knowledge Graph Embedding, named Pykg2vec. This library overcomes previous libraries’ difficulties and provides a versatile and generalized platform for different research and other deployments.

Pykg2vec
Comparison of features in Pykg2vec with that in other libraries (Source

How Pykg2vec works

The facts in a Knowledge Graph are represented in triplets in the form of (h, r, t), where h is the head entity, t is the tail entity, and r is the relation between those entities. Knowledge Graph Embeddings learns a function that maps these high-dimensional facts into low-dimensional vectors by preserving the original high-dimensional features’ quality. The original facts are usually termed the positive triplets. A few of these triplets are sampled; either their heads (?, r, t) or tails (h, r, ?) are corrupted and termed the negative triplets. The KGE model is trained to award rewards for positive triplets and penalties for negative triplets. Loss functions such as binary cross-entropy loss or logistic loss are used in this model to find the corrupted entity or to check whether a given triplet is positive or negative. 

Pykg2vec architecture
The Pykg2vec Architecture (Source)

The library discovers the golden hyper-parameters suitable for the model-dataset pair on its own. This is termed the Golden Setting. Users can customize these settings too. This library incorporates Bayesian Optimizer to perform the hyper-parameters discovery.

Python Implementation of Pykg2vec

Pykg2vec is built using Python on top of the PyTorch framework. Nevertheless, it supports TensorFlow implementation also. Official codes are provided for both the PyTorch version and the TensorFlow version.

  1. Install Anaconda-3 distribution in the local machine.
 !wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
 !bash Anaconda3-2020.02-Linux-x86_64.sh 

Output:

pykg2vec
  1. Create a development environment named Pykg2vec by providing the following commands successively.
!bash

and, inside the base activation command mode, provide:

conda create --name pykg2vec python=3.6

A part of Code and Output:

pykg2vec
  1. Activate the development environment.
conda activate pykg2vec
  1. If the local machine is enabled with a GPU runtime and CUDA 10.1, the following command installs the compatible PyTorch version and its dependencies.
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

On the other hand, if the local machine is enabled only with CPU, the following command may be of help.

conda install pytorch torchvision cpuonly -c pytorch
  1. Set up the library by cloning the source code from GitHub.
git clone https://github.com/Sujit-O/pykg2vec.git

Code and Output:

  1. Change the directory to /content/pykg2vec/ to proceed further with the source files.
cd pykg2vec
  1. Install the package using the following command.
python setup.py install

A part of Code and Output:

  1. Check for tunable parameters using the command,
pykg2vec-train -h
  1. Train the TransE model on the FB15k benchmark dataset to sample the performance.
 cd /content/pykg2vec/examples/
 pykg2vec-train -mn TransE 

A part of Code and Output:

pykg2vec
pykg2vec

It should be noted that training takes around 2 hours to complete in a CPU runtime. Users may opt for a GPU runtime for quick training and inference.

  1. Make inference on the fully trained TransE model using the following command.
pykg2vec-infer -mn TransE

Wrapping up

Pykg2vec is a versatile Python library for training, testing, experimenting, researching and educating the models, datasets and configurations related to the Knowledge Graph Embedding. Pykg2vec presently supports 25 state-of-the-art KGE models: SLM, ConvE, Complex, RotatE, CP, TuckER, SME, DistMult, NTN, ConvKB, TransE, TransH, TransR, TransD, TransM, KB2E, MuRP, InteractE, OctonionE, RESCAL, Analogy, ProjE, SimplE, HypER and QuatE.

Models supported by pykg2vec
State-of-the-art models supported by the Pykg2vec library (source)

Pykg2vec library outshines the present KGE libraries such as AmpliGraph, PyKEEN and OpenKE in the number of models, the number of datasets and the way of discovering and setting the hyper-parameters.

Further reading:

Share
Picture of Rajkumar Lakshmanamoorthy

Rajkumar Lakshmanamoorthy

A geek in Machine Learning with a Master's degree in Engineering and a passion for writing and exploring new things. Loves reading novels, cooking, practicing martial arts, and occasionally writing novels and poems.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.