What Is Microsoft’s Project Alexandria?

Studies show that inefficiencies in knowledge searching can adversely affect enterprise performance and productivity.

Creating an up to date and accurate knowledge base has been a significant challenge in machine learning. Studies show that inefficiencies in knowledge searching can adversely affect enterprise performance and productivity. On the other hand, a recent study found that having a knowledge base eliminates the need for information searching that can save up to six hours a week.

To this end, in 2014, Microsoft introduced Project Alexandria. Built on years of Microsoft’s work in knowledge mining research, Alexandria is a system for unsupervised, high precision knowledge base construction. 

Project Alexandria 

Alexandria uses probabilistic programs to convert knowledge base facts into unstructured text. It uses a unique type of process by which topics are mined and linked to present a knowledge base from big data. The most significant advantage of the approach is that the task information is included in the probabilistic program and does not require using large amounts of labelled data. In addition, it allows the process to perform unsupervised learning, meaning it can complete tasks without any human input.

Credit: Microsoft

Using a probabilistic program allows uncertainty in the text to propagate through to the retrieved facts. This in turn increases accuracy and helps in merging facts from multiple sources. Since Alexandria doesn’t require labelled training data, only minimum manual input is required to construct knowledge bases.

Initially, the focus of Alexandria was to mine knowledge from websites like Wikipedia. But in the last few years, its coverage and increased to include enterprise data such as documents, messages, and emails.

The programme works in two steps:

  • Topic mining: It discovers the topic a person is looking for in extensive data documents and keeps an eye on the maintenance and upkeep of these topics as records change or newer one’s are created. The algorithm runs a query engine to extract snippets from each document with a high probability of containing knowledge. The parsing procedure identifies hundreds of documents that can have property value. This approach is called template matching where the model looks for patterns or templates. The algorithm performs unsupervised learning to create such templates from both structured and unstructured data. The process is achieved by running the probabilistic programming backwards using Infer.NET. 
  • Topic linking: The process brings together knowledge from a variety of sources to a single unified base. It identifies all duplicate and overlapping entities and projects it using a probabilistic programming driven-clustering process for parsing. After mining the knowledge, all knowledge is brought together from a variety of sources to a single unified base. It ensures consistency throughout Alexandria’s pipeline.

Microsoft Viva Topics & Alexandria

Project Alexandria’s technology plays a vital role in the architecture of Microsoft’s new platform Viva topics. These are one of the four modules of Microsoft Viva, an employee experience platform for communications, learning, resources, and knowledge management. Viva Topics use AI to organise resources into topics to be further delivered via applications such as Microsoft Office and SharePoint. Thus, the platform transforms big data into big knowledge. 

Microsoft’s different technologies, with their speciality, process rich metadata so that it can contribute to viva topics. Speaking on the Alexandria-Viva Topics collaboration Naomi Moneypenny, who leads Viva Topics product development, said, “The Project Alexandria team and technologies have been instrumental to delivering the innovative experiences for customers in Viva Topics. We value their highly collaborative approach to working with many other specialist teams across Microsoft.”

Wrapping up

Microsoft soon plans to internationalise the knowledge base platform, allowing information in multiple languages to be processed. In addition, the platform aims to translate extracted data from every language automatically. 

This will help users in customising the knowledge discovery process. It will enable developers to create a knowledge base with an architecture tailored to the needs of each organisation with familiar terminology & language that people in the organisation use. 

More Great AIM Stories

Meenal Sharma
I am a journalism undergrad who loves playing basketball and writing about finance and technology. I believe in the power of words.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Vijaysinh Lendave
How To Create Interactive Public Dashboards And Storylines In Tableau?

An increase in data analytics and data integration has made way for more specialized visual analytical tools. Typically files like excel spreadsheets are very good with analytics and visualization, but it has limitations like it can not handle big data, which is our main concern. On the other hand, specialised software leverages easy operation on both static and dynamic data, computational speed, self-service function, and interactive visualization facilitate users to pull up a report or dashboard or storyline and freely deep dive to granular levels of information. 

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM