MITB Banner

Are data scientists moving away from Jupyter Notebooks?

In March this year, Aqua Security’s Team Nautilus discovered a Python-based ransomware that was using Jupyter Notebooks to access and target other environments.

Share

Listen to this story

Fernando Perez and Brian Granger spun off Project Jupyter from IPython in 2014. The name Jupyter was a reference to the three main programming languages that Jupyter supports, which are Julia, Python and R besides also being an homage to the notebooks that Galileo wrote on when he discovered Jupiter’s moons. Immediately after its release, a GitHub analysis showed that more than 2.5 million public Jupyter Notebooks were in use by September 2018. For data scientists, Jupyter became the staple environment and often the first tool they were introduced in a data science course. They were a great way to showcase a user’s work since both the code and the results can be seen right next to each other. They were also built for sharing insights easily with colleagues. But even as data scientists grew increasingly reliant on Jupyter Notebooks, researchers stayed wary about their deficiencies. 

      Basic structure of Jupyter Notebook, Source: Jupyter

In 2018, a researcher called Joel Grus, who worked with the Allen Institute for Artificial Intelligence, presented his argument against Jupyter Notebooks called ‘I don’t like notebooks’ at the Jupyter developer’s conference. While Grus admitted that they were easy to use and efficient at exploratory data analysis, they also messed up code. He said that these notebooks encouraged bad coding habits among data scientists. Users don’t run code in the cells in the exact order and then end up frustrated. 

Code structuring

In some ways, Jupyter Notebooks make it harder for data scientists to collaborate while coding. When typing code in cells instead of functions or classes, or objects, users end up with duplicate code, which becomes confusing and difficult to maintain in Jupyter. Since coders are normally copying snippets of code from each other, it can get out of sync very quickly. Code duplication also means it is difficult to stick to one version of the answer or which notebook should have the best solution to the number of xyz. Further, they also make sharing plots outside of the data science teams tougher. Since giving access to the underlying data in big organisations is risky, plots are generally shared externally via copy-paste when the data changes, which is cumbersome.  

Not apt for production

Jupyter Notebooks are proven to be not as effective once data scientists have moved past data exploration into actual production. To build serious data pipelines, it is imperative to have good code structuring so that there is test-driven development. Production requires reproducing experiments again and again and running notebooks, often for which Jupyter Notebooks aren’t built. Jupyter Notebooks have a non-linear workflow, which on the one hand, has made it more interactive so that users can jump between coding and the notebook. On the other hand, this has also led to results that are not iterable, leading to more hindrances in the production stages. 

Security issues

Recently, a more serious security-related issue has reared its ugly head. In March this year, Aqua Security’s Team Nautilus discovered Python-based ransomware that was using Jupyter Notebooks to access and target other environments. While there is no confirmation yet, researchers believe the source of attack to be Russian. A report by Aqua Security stated that since Jupyter Notebooks are mainly used by data scientists in companies to analyse data and build data models, they were susceptible to security breaches. 

Better alternatives

Last year, it was found that Jupyter Notebooks were also abused by cryptocurrency miners due to the large number of requests that are processed. 

A mix of these issues is leading data scientists away from Jupyter towards other alternatives like Deepnote. Research communities claim that Deepnote has worked on the problems that Jupyter had and resolved a number of them. While Deepnote has all the basic functions of Jupyter, it also provides the luxury of real-time collaboration. Unlike Jupyter, which requires users to download the notebook or upload it on GitHub, Deepnote’s free version allows up to three users to work together. 

Deepnote also allows for ease in the integration of data from Google Cloud, Postgre SQL and Amazon S3, which makes life much easier for data scientists. The notebook offers an interface option that requires the user to fill in some information and connect to the source with just another click. 

Source: Deepnote

Once the code is written, Deepnote has a visualisation feature that identifies a dataframe automatically and reproduces a graph with options for the user to shift the X-axis and Y-axis as they wish. So instead of writing another piece of code to create a visualisation, Deepnote helps data scientists with this advanced step. Deepnote has also improved upon other features like users can now check their history and review the changes that were made to a notebook and revert if they want. 

Deepnote’s founder and chief Jakub Jurovych said the startup intended to build a tool especially for data scientists with the collaborative feature in mind. Jurovych said that every tool he had tried when starting out had failed to impress when it came to collaboration. As this shift has gradually taken over, ByteDance, Discord and HR platform Gusto have all moved to use Deepnote. “Right now, Deepnote shines in situations where you want to move your work from a playground environment to a more serious or more collaborative setting, for example, your team,” Jurovych said. 

Share
Picture of Poulomi Chatterjee

Poulomi Chatterjee

Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.