MITB Banner

Now GitHub Wants To Preserve Codes Forever, Launches Archive Program

Share

GitHub Archive Program

In an attempt to preserve open-source codes for future generations, Github recently launched their GitHub Archive Program. This week at the GitHub universe event, the firm announced that they will open the Arctic Code Vault on 2 February 2020 to host open-source repositories.

Partnered with Long Now Foundation, the Internet Archive, the Arctic World Archive, and more, GitHub aims to protect codes in a wide range of storage and formats for 1,000 years. Archived data last only for a few years as it gradually rots, which is called bit rot. As a result, over the years, we lose critical information from storage devices whose lifespan is around 30 years.

Therefore, to help the future generations in gaining access to the building blocks of today’s superior products, the consortium has planned to store data in media that can last for many hundred years and mitigate the problem of data decay.

Approach 

GitHub will utilise a flexible durable strategy for archiving code called pace layer. The idea is to have a wide range of storage type that can cater to various kinds of needs: from real-time to long-time. 

Consequently, the program has been partitioned into hot, warm, and cold storage. While the hot storage will be used for real-time requirements and will hold the current GitHub data, the medium storage will be updated on a monthly basis to archive the repository. The cold storage is focused on long-term storage and will be updated every 5+ years.

All the three storage is further divided into several types to preserve data by crawling through the GitHub site and keeping it in the Internet Archive, Bodleian Library, among others. But, the facet of this storage is the Project Silica that was developed by the Microsoft Research team. Eventually, all the data of public active repository to safeguard the codes for 10,000 years.

Cold Storage

Each repository will be stored in a single TAR file and will be encoded with QR for data integrity. GitHub data will be protected in the 250 meters deep decommissioned coal mine of an Arctic mountain situated in Svalbard. While the location is also prone to climate change, it is envisioned that it will not have any impact on the preserved data. Svalbard is one of the safest places on the planet for years as it witnesses only a few natural hazards. The location is also used for archiving other knowledge of humans, such as various types of seeds, the Svalbard Global Seed Vault.

How Impactful Will It Be?

While GitHub’s primary intention is to preserve open-source for the future generation, it also wants to encourage other organisations to plan for long-term storage. Besides, the firm also illustrates examples of the history of lost technologies in several events such as the Library of Alexandria, Roman Concrete, and more, where humans lost much useful information due to absence of backups.

However, those were the times when the information was difficult to move, and new groundbreaking discovery or invention took years or even decades to proliferate. Thus, only a few people used to stay informed about technologies. But in the internet world, when information is moving at a rapid pace, it is easy to keep track of the developments.

Besides, today, in ever-changing landscape technology become irrelevant very quickly due to new technologies. Today’s open-source projects might not be as useful as the firm is envisioning it.

Outlook

Storing of physical things such as seed for the future generation is a wise idea as there is a risk of losing something that will always be useful to humans. Technology in thousands of years from now would be more advanced than it is today. However, preparing for the worst is not a terrible idea and software is definitely the way to go.

Share
Picture of Rohit Yadav

Rohit Yadav

Rohit is a technology journalist and technophile who likes to communicate the latest trends around cutting-edge technologies in a way that is straightforward to assimilate. In a nutshell, he is deciphering technology. Email: rohit.yadav@analyticsindiamag.com
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.