This Theoretical Breakthrough At MIT Could Significantly Increase Data Storage

Linear probing hash tables from CSAIL at the Massachusetts Institute of Technology (MIT) could lead to more efficient computer data storage and retrieval.

A team of MIT researchers has made a breakthrough in the storage and retrieval of data in computers. New research from MIT’s CSAIL on linear-probing hash tables may result in more efficient computer data storage and retrieval. In addition, a theoretical discovery could increase data storage. Data structures, known as “linear-probing hash tables”, are a standard strategy of organising and storing data in computers using data structures. Linear-probing hash tables allow data to be stored at any point along with an array.

For instance, William Kuszmaul, a computer science PhD student at MIT, proposes that the database contains 10,000 social security numbers. “Obtain the social security number x and compute its hash function, h (x), which yields a random number between 1 and 10,000. The following step is to take that random number, h (x). Then, to navigate to that position in the array and insert the social security number x.”

Handling Deletion

The deletion of personal information like social security numbers follows a particular set of rules. It can be difficult to find something later if the hash table is left empty after deleting information. When you’re looking for something, it’s easy to have the impression that there are no other places where you can get it. To overcome this issue, Kuszmaul adds, “one can go to the location of the piece that was removed and place a small marker called a ‘tombstone,’ which shows there was once an element here, but it has since been removed.”

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Linear Probing Features

However, most people who utilise linear probing hash tables anticipated that when they were too crowded. Linear probing would conduct lengthy stretches of occupied locations in a “cluster” fashion. As a result, the time required to discover an empty seat increases considerably — to the point of quadratic growth — as long as the expectation is false. As a result, individuals have been educated to work with hashtables on a limited capacity basis. They are a strategy that can be costly in terms of the quantity of hardware that a business must purchase and maintain.

However, the study of Kuszmaul and his colleagues, Michael Bender of Stony Brook University, and Bradley Kuszmaul of Google has utterly disproved this long-held notion that has been used to counteract high-load issues. Instead, the linear probing hash table provides great storage capacity without losing performance. Additionally, the team developed a novel approach named “Graveyard Hash.” This hashtable is accomplished by intentionally increasing the number of tombstones in the array until it fills approximately half of the available space. These tombstones provide additional areas for future inscriptions. Kuszmaul says this method is taken in the opposite direction of how people are typically educated and “may result in the best performance in a linear probing hash table.” 


Companies demand new technologies for more effective data storage and retrieval in computers in today’s era of information overload. According to researchers at MIT, linear probing can reintroduce older technology such as linear probing hash tables since they are more efficient. The researchers compared modern and historical storage systems. Along with the traditional method, the researchers developed a new technique called cemetery hashing, which entails artificially increasing the number of tombstones in an array until they occupy approximately half of the available locations. As a result, linear probing will improve data storage, retrieval speed and efficiency from computers.
For more information, refer to the article.

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox