Creating an up to date and accurate knowledge base has been a significant challenge in machine learning. Studies show that inefficiencies in knowledge searching can adversely affect enterprise performance and productivity. On the other hand, a recent study found that having a knowledge base eliminates the need for information searching that can save up to six hours a week.
To this end, in 2014, Microsoft introduced Project Alexandria. Built on years of Microsoft’s work in knowledge mining research, Alexandria is a system for unsupervised, high precision knowledge base construction.
Alexandria uses probabilistic programs to convert knowledge base facts into unstructured text. It uses a unique type of process by which topics are mined and linked to present a knowledge base from big data. The most significant advantage of the approach is that the task information is included in the probabilistic program and does not require using large amounts of labelled data. In addition, it allows the process to perform unsupervised learning, meaning it can complete tasks without any human input.
Using a probabilistic program allows uncertainty in the text to propagate through to the retrieved facts. This in turn increases accuracy and helps in merging facts from multiple sources. Since Alexandria doesn’t require labelled training data, only minimum manual input is required to construct knowledge bases.
Initially, the focus of Alexandria was to mine knowledge from websites like Wikipedia. But in the last few years, its coverage and increased to include enterprise data such as documents, messages, and emails.
The programme works in two steps:
- Topic mining: It discovers the topic a person is looking for in extensive data documents and keeps an eye on the maintenance and upkeep of these topics as records change or newer one’s are created. The algorithm runs a query engine to extract snippets from each document with a high probability of containing knowledge. The parsing procedure identifies hundreds of documents that can have property value. This approach is called template matching where the model looks for patterns or templates. The algorithm performs unsupervised learning to create such templates from both structured and unstructured data. The process is achieved by running the probabilistic programming backwards using Infer.NET.
- Topic linking: The process brings together knowledge from a variety of sources to a single unified base. It identifies all duplicate and overlapping entities and projects it using a probabilistic programming driven-clustering process for parsing. After mining the knowledge, all knowledge is brought together from a variety of sources to a single unified base. It ensures consistency throughout Alexandria’s pipeline.
Microsoft Viva Topics & Alexandria
Project Alexandria’s technology plays a vital role in the architecture of Microsoft’s new platform Viva topics. These are one of the four modules of Microsoft Viva, an employee experience platform for communications, learning, resources, and knowledge management. Viva Topics use AI to organise resources into topics to be further delivered via applications such as Microsoft Office and SharePoint. Thus, the platform transforms big data into big knowledge.
Microsoft’s different technologies, with their speciality, process rich metadata so that it can contribute to viva topics. Speaking on the Alexandria-Viva Topics collaboration Naomi Moneypenny, who leads Viva Topics product development, said, “The Project Alexandria team and technologies have been instrumental to delivering the innovative experiences for customers in Viva Topics. We value their highly collaborative approach to working with many other specialist teams across Microsoft.”
Microsoft soon plans to internationalise the knowledge base platform, allowing information in multiple languages to be processed. In addition, the platform aims to translate extracted data from every language automatically.
This will help users in customising the knowledge discovery process. It will enable developers to create a knowledge base with an architecture tailored to the needs of each organisation with familiar terminology & language that people in the organisation use.