In an attempt to preserve open-source codes for future generations, Github recently launched their GitHub Archive Program. This week at the GitHub universe event, the firm announced that they will open the Arctic Code Vault on 2 February 2020 to host open-source repositories.
Partnered with Long Now Foundation, the Internet Archive, the Arctic World Archive, and more, GitHub aims to protect codes in a wide range of storage and formats for 1,000 years. Archived data last only for a few years as it gradually rots, which is called bit rot. As a result, over the years, we lose critical information from storage devices whose lifespan is around 30 years.
Therefore, to help the future generations in gaining access to the building blocks of today’s superior products, the consortium has planned to store data in media that can last for many hundred years and mitigate the problem of data decay.
GitHub will utilise a flexible durable strategy for archiving code called pace layer. The idea is to have a wide range of storage type that can cater to various kinds of needs: from real-time to long-time.
Consequently, the program has been partitioned into hot, warm, and cold storage. While the hot storage will be used for real-time requirements and will hold the current GitHub data, the medium storage will be updated on a monthly basis to archive the repository. The cold storage is focused on long-term storage and will be updated every 5+ years.
All the three storage is further divided into several types to preserve data by crawling through the GitHub site and keeping it in the Internet Archive, Bodleian Library, among others. But, the facet of this storage is the Project Silica that was developed by the Microsoft Research team. Eventually, all the data of public active repository to safeguard the codes for 10,000 years.
Each repository will be stored in a single TAR file and will be encoded with QR for data integrity. GitHub data will be protected in the 250 meters deep decommissioned coal mine of an Arctic mountain situated in Svalbard. While the location is also prone to climate change, it is envisioned that it will not have any impact on the preserved data. Svalbard is one of the safest places on the planet for years as it witnesses only a few natural hazards. The location is also used for archiving other knowledge of humans, such as various types of seeds, the Svalbard Global Seed Vault.
How Impactful Will It Be?
While GitHub’s primary intention is to preserve open-source for the future generation, it also wants to encourage other organisations to plan for long-term storage. Besides, the firm also illustrates examples of the history of lost technologies in several events such as the Library of Alexandria, Roman Concrete, and more, where humans lost much useful information due to absence of backups.
However, those were the times when the information was difficult to move, and new groundbreaking discovery or invention took years or even decades to proliferate. Thus, only a few people used to stay informed about technologies. But in the internet world, when information is moving at a rapid pace, it is easy to keep track of the developments.
Besides, today, in ever-changing landscape technology become irrelevant very quickly due to new technologies. Today’s open-source projects might not be as useful as the firm is envisioning it.
Storing of physical things such as seed for the future generation is a wise idea as there is a risk of losing something that will always be useful to humans. Technology in thousands of years from now would be more advanced than it is today. However, preparing for the worst is not a terrible idea and software is definitely the way to go.