Google AI research scientist Bryan Perozzi and research intern Qi Zhu have released a study that offers an answer for using Graph Neural Networks on biased datasets. Called ‘Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data,’ the research shows how a new method can measure the distributional differences between biased data and a graph’s true inference distribution. The largeness of the shift between two probability distributions also helps account for the amount of bias.
Machine learning models face issues with generalising from the biased training data as the shift grows wider. This affects the generalisability of a model adversely. When a dataset’s accuracy was tested using a statistical analysis called the F-score, it was found that domain shifts led to a decrease in performance by 15-20 per cent.
As the distribution shift increases, the classification accuracy of the dataset decreases, which makes it hard for GNNs to generalise models as the difference between the training dataset and test dataset also increases. The study states that the distribution shift between training data and unlabeled data was reduced using a shift-robust regulariser. This technique assesses the domain shift during the training and then penalises it. This forces the model to ignore the training bias as far as it can.
Results from the research showed that SR-GNN was more effective on biased training datasets and beat GNN outcomes in terms of accuracy. It also reduced the negative impact caused by biased data by 30 per cent to 40 per cent.