Knowledge Graph is an ER-based (Entity-Relationship) feature representation learning approach that finds applications in various domains such as natural language processing, medical sciences, finance and e-commerce. Knowledge Graph evolves as a dense graphical network where entities of the data form the nodes and relations form the connections between those nodes. As the data size grows in a large scale, a Knowledge Graph becomes very dense and high-dimensional, demanding powerful computational resources. This issue was alleviated by introducing Knowledge Graph Embedding (KGE), which maps the high-dimensional representation into a compute-efficient low-dimensional embedded representation.
Many recent researches have concentrated on Knowledge Graph Embedding, and thus powerful task-focused methods have been developed. Some generalized platforms such as PyKEEN, OpenKE and AmpliGraph are introduced as libraries that support KGE models and datasets. Research and other deployment needs can be fulfilled directly using these open source libraries. These libraries make the source code readily available, enable adapting the source code to the custom dataset, help correctly parameterize the models, and compare one method against another.
Sign up for your weekly dose of what's up in emerging technology.
The available open-source KGE libraries impose specific preset hyper-parameters that do not match for all models. Rather, they work for specific algorithms, dataset pipelines and benchmarks. For new datasets, these libraries mostly fail to discover the golden hyper-parameters on their own, forcing the user to try different predefined hyper-parameters to determine the right ones. These drawbacks question the generalizability of these libraries while there presents a high demand for the generalization.
Shih-Yuan Yu, Sujit Rokka Chhetri and Mohammad Abdullah Al Faruque of the University of California-Irvine, Arquimedes Canedo of the Siemens Corporate Technology, and Palash Goyal of the University of Southern California have introduced a robust and powerful library for Knowledge Graph Embedding, named Pykg2vec. This library overcomes previous libraries’ difficulties and provides a versatile and generalized platform for different research and other deployments.
How Pykg2vec works
The facts in a Knowledge Graph are represented in triplets in the form of (h, r, t), where h is the head entity, t is the tail entity, and r is the relation between those entities. Knowledge Graph Embeddings learns a function that maps these high-dimensional facts into low-dimensional vectors by preserving the original high-dimensional features’ quality. The original facts are usually termed the positive triplets. A few of these triplets are sampled; either their heads (?, r, t) or tails (h, r, ?) are corrupted and termed the negative triplets. The KGE model is trained to award rewards for positive triplets and penalties for negative triplets. Loss functions such as binary cross-entropy loss or logistic loss are used in this model to find the corrupted entity or to check whether a given triplet is positive or negative.
The library discovers the golden hyper-parameters suitable for the model-dataset pair on its own. This is termed the Golden Setting. Users can customize these settings too. This library incorporates Bayesian Optimizer to perform the hyper-parameters discovery.
Python Implementation of Pykg2vec
Pykg2vec is built using Python on top of the PyTorch framework. Nevertheless, it supports TensorFlow implementation also. Official codes are provided for both the PyTorch version and the TensorFlow version.
- Install Anaconda-3 distribution in the local machine.
!wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh !bash Anaconda3-2020.02-Linux-x86_64.sh
- Create a development environment named Pykg2vec by providing the following commands successively.
and, inside the base activation command mode, provide:
conda create --name pykg2vec python=3.6
A part of Code and Output:
- Activate the development environment.
conda activate pykg2vec
- If the local machine is enabled with a GPU runtime and CUDA 10.1, the following command installs the compatible PyTorch version and its dependencies.
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
On the other hand, if the local machine is enabled only with CPU, the following command may be of help.
conda install pytorch torchvision cpuonly -c pytorch
- Set up the library by cloning the source code from GitHub.
git clone https://github.com/Sujit-O/pykg2vec.git
Code and Output:
- Change the directory to
/content/pykg2vec/to proceed further with the source files.
- Install the package using the following command.
python setup.py install
A part of Code and Output:
- Check for tunable parameters using the command,
- Train the TransE model on the FB15k benchmark dataset to sample the performance.
cd /content/pykg2vec/examples/ pykg2vec-train -mn TransE
A part of Code and Output:
It should be noted that training takes around 2 hours to complete in a CPU runtime. Users may opt for a GPU runtime for quick training and inference.
- Make inference on the fully trained TransE model using the following command.
pykg2vec-infer -mn TransE
Pykg2vec is a versatile Python library for training, testing, experimenting, researching and educating the models, datasets and configurations related to the Knowledge Graph Embedding. Pykg2vec presently supports 25 state-of-the-art KGE models: SLM, ConvE, Complex, RotatE, CP, TuckER, SME, DistMult, NTN, ConvKB, TransE, TransH, TransR, TransD, TransM, KB2E, MuRP, InteractE, OctonionE, RESCAL, Analogy, ProjE, SimplE, HypER and QuatE.