Listen to this story
Fernando Perez and Brian Granger spun off Project Jupyter from IPython in 2014. The name Jupyter was a reference to the three main programming languages that Jupyter supports, which are Julia, Python and R besides also being an homage to the notebooks that Galileo wrote on when he discovered Jupiter’s moons. Immediately after its release, a GitHub analysis showed that more than 2.5 million public Jupyter Notebooks were in use by September 2018. For data scientists, Jupyter became the staple environment and often the first tool they were introduced in a data science course. They were a great way to showcase a user’s work since both the code and the results can be seen right next to each other. They were also built for sharing insights easily with colleagues. But even as data scientists grew increasingly reliant on Jupyter Notebooks, researchers stayed wary about their deficiencies.
Basic structure of Jupyter Notebook, Source: Jupyter
Sign up for your weekly dose of what's up in emerging technology.
In 2018, a researcher called Joel Grus, who worked with the Allen Institute for Artificial Intelligence, presented his argument against Jupyter Notebooks called ‘I don’t like notebooks’ at the Jupyter developer’s conference. While Grus admitted that they were easy to use and efficient at exploratory data analysis, they also messed up code. He said that these notebooks encouraged bad coding habits among data scientists. Users don’t run code in the cells in the exact order and then end up frustrated.
In some ways, Jupyter Notebooks make it harder for data scientists to collaborate while coding. When typing code in cells instead of functions or classes, or objects, users end up with duplicate code, which becomes confusing and difficult to maintain in Jupyter. Since coders are normally copying snippets of code from each other, it can get out of sync very quickly. Code duplication also means it is difficult to stick to one version of the answer or which notebook should have the best solution to the number of xyz. Further, they also make sharing plots outside of the data science teams tougher. Since giving access to the underlying data in big organisations is risky, plots are generally shared externally via copy-paste when the data changes, which is cumbersome.
Not apt for production
Jupyter Notebooks are proven to be not as effective once data scientists have moved past data exploration into actual production. To build serious data pipelines, it is imperative to have good code structuring so that there is test-driven development. Production requires reproducing experiments again and again and running notebooks, often for which Jupyter Notebooks aren’t built. Jupyter Notebooks have a non-linear workflow, which on the one hand, has made it more interactive so that users can jump between coding and the notebook. On the other hand, this has also led to results that are not iterable, leading to more hindrances in the production stages.
Recently, a more serious security-related issue has reared its ugly head. In March this year, Aqua Security’s Team Nautilus discovered Python-based ransomware that was using Jupyter Notebooks to access and target other environments. While there is no confirmation yet, researchers believe the source of attack to be Russian. A report by Aqua Security stated that since Jupyter Notebooks are mainly used by data scientists in companies to analyse data and build data models, they were susceptible to security breaches.
Last year, it was found that Jupyter Notebooks were also abused by cryptocurrency miners due to the large number of requests that are processed.
A mix of these issues is leading data scientists away from Jupyter towards other alternatives like Deepnote. Research communities claim that Deepnote has worked on the problems that Jupyter had and resolved a number of them. While Deepnote has all the basic functions of Jupyter, it also provides the luxury of real-time collaboration. Unlike Jupyter, which requires users to download the notebook or upload it on GitHub, Deepnote’s free version allows up to three users to work together.
Deepnote also allows for ease in the integration of data from Google Cloud, Postgre SQL and Amazon S3, which makes life much easier for data scientists. The notebook offers an interface option that requires the user to fill in some information and connect to the source with just another click.
Once the code is written, Deepnote has a visualisation feature that identifies a dataframe automatically and reproduces a graph with options for the user to shift the X-axis and Y-axis as they wish. So instead of writing another piece of code to create a visualisation, Deepnote helps data scientists with this advanced step. Deepnote has also improved upon other features like users can now check their history and review the changes that were made to a notebook and revert if they want.
Deepnote’s founder and chief Jakub Jurovych said the startup intended to build a tool especially for data scientists with the collaborative feature in mind. Jurovych said that every tool he had tried when starting out had failed to impress when it came to collaboration. As this shift has gradually taken over, ByteDance, Discord and HR platform Gusto have all moved to use Deepnote. “Right now, Deepnote shines in situations where you want to move your work from a playground environment to a more serious or more collaborative setting, for example, your team,” Jurovych said.