Python removed two fake libraries from Python Package Index (PyPI) after a German developer, Lukas Martini, reported about the packages stealing critical information. Python was released almost three decades ago, but it was only embraced in the last few years due to the increase in artificial intelligence and data science-based third-party libraries.
However, these very libraries can become the prime reason for Python’s downfall. This is the third time Python org witnessed infiltration and extracting information — the other three occurred in July 2019, October 2018, and September 2017.
The Incident
Typosquatting – a form of cybersquatting technique that takes advantage typos made by users to hack into information – was used for deceiving and getting access to sensitive data. The idea behind such a technique is to register a look-alike name for the genuine package name, so that when a developer makes a typo he/she might import the phoney library instead of the desired one. As the fake library is designed to work as the genuine one, developers do not notice any discrepancies.
The two libraries were “jeIlyfish” and “python3-dateutil”, which resonate popular “jellyfish” and “dateutil” library. The fake “jeIlyfish” has a capital I instead of L and the “python3-dateutil” has extra word “ python3” in it.
However, the malicious code was present in the ‘jeIlyfish’ and not in the ‘python3-dateutil’. The latter had imported the former in its Python file, thereby making ‘python3-dateutil’ malicious as well.
It was reported that, on implementation, the library downloaded a file named ‘hashsum’ that decodes into Python file and executes to exfiltrate SSH and GPG keys from developers computer. The stolen information was then sent to http://68.183.212.246:32258, including the list of repositories, home directory, PyCharm projects directory.
On information from the developer on 1 December, Python removed the libraries to fortify further attack.
What Are Its Implications
Unlike other prominent programming languages, Python banks on third-party libraries. While this has helped Python to proliferate, it comes with a lot of security threat. When a developer installs a library, it usually contains modules from different vendors, and similarly, that module can further include packages from unknown sources. This can have a very long tail, as a result, evaluating can get tedious.
Although before integrating any new libraries, Python organisation check for its trustworthiness but it cannot guarantee complete privacy due to its strenuous nature. Consequently, one can expect similar occurrences in the future as well.
Is There A Solutions
Python cannot follow the methodology of providing most of the libraries by itself, similar to what Google does for Android development. Data science and AI are vast, and Python organisation cannot keep up with the pace of the new developments that happen in the landscape.
And without the third-party libraries, Python will be similar to other programming languages in the data science and AI with limited capabilities for manipulating data.
Granting access to the third-party libraries with user permission can be a way forward, but will only solve a part of the problem as packages will still have access to numerous information. And manual checking of every imported library can be cumbersome to manage.
Consequently, the restriction of permission to directories and manual checking still remains an ineffective solution.
Outlook
Third-party libraries were always a risk and developers in the past used to restrain themselves from embracing those libraries. But with the rising need for reusable code for quickly innovating in the AI landscape, it became the new normal. Therefore, the threat remains at least in libraries that are not popular among many developers.
Python cannot sustain without the third-party libraries, but it quickly needs to find a way out to determine malicious packages and protect developers from unknowingly adopting malevolent modules.