Branded Content

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.
How Generative AI is Revolutionising Data Science Tools

Since ChatGPT took over newsfeeds last winter, we’ve all seen the rumblings of change caused by generative AI across industries, such as media, art, and education, with mixed results and reactions from veterans in each domain. After many memes and tweets about the poor code generated by the original ChatGPT release, OpenAI came up with the ChatGPT Code Interpreter. 

While the plug-in may have temporarily addressed the issue troubled by ChatGPT’s lack in this area, it’s unlikely that anyone is going to switch to ChatGPT from VS Code or Jupyter notebooks any time soon. ChatGPT is still just a chatbot after all, and it doesn’t have all the functionality to rival the power of an IDE or data notebook. 

However, this release, among others, raises a few questions. 

Can generative AI really have an impact on a technical field like data science? Is it really going to improve the efficiency of data teams? Is generative AI just the topic of the year, or will it leave a lasting impact?

Boost Your Data Workflow with Einblick’s AI-native Notebook

In order for generative AI to truly transform the data science process, it needs to be embedded directly in the notebooks where data professionals are working. That’s the approach taken by Einblick, a new AI-native notebook born out of research at MIT and Brown University.

Einblick is an AI-native data notebook that can write and fix code, create beautiful charts, build and tune ML models, and much more. As modern data teams continue to evolve, they require an ever-increasing level of speed, agility, and flexibility. Earlier this year, Einblick launched their AI agent, Einblick Prompt, which is embedded in every Einblick workspace. 

With Prompt, users can build out entire data workflows just using natural language. From data cleaning to exploratory data analysis, model building and tuning, Prompt speeds up every aspect of the data science and data analytics process. With Prompt, Einblick has essentially bottled the power of a Jupyter notebook with the simplicity of ChatGPT.

While many tech companies are seizing the opportunity to provide automated data processing or code generation services through large language models, Prompt comes from a company whose mission has always been simplifying and optimizing workflows for data teams. As such, Prompt offers several unique benefits with that goal in mind. Let’s take a look at them. 

Context-awareness: Prompt leverages metadata, such as the formatting of column names, dataset names, and data types, so users don’t need to input paragraphs of information to obtain well-commented and tailored code for their dataset and problem.

Automated data processes: Even if you simply instruct Prompt to “Predict survival” using a specific dataset, Prompt will automatically preprocess your data, check for missing values, split your data into training and testing sets, and display evaluation metrics.

Editable code: Since all of Prompt’s code is generated in a data notebook, you can manually edit and test it immediately. Furthermore, Prompt features a “Change this cell” function that allows you to refine the generated code quickly and intuitively using natural language.

One-click bug fixes: If you encounter an error message in Einblick, you can click the “Fix with Prompt” button, and Prompt will debug your error, display where it modified the code, and provide an explanation of the changes made.

Access to LLMs without API keys: Unlike many other apps that require users to provide their own API keys, Einblick manages all of this for you.

Prompt’s specific value-add seamlessly integrates with Einblick’s other core features. But what are they? 

Multimodal workflows: Combine Python, SQL, and interactive components like Charts, Tables, and Filters in the same workspace.

2-D canvas layout: Prototype visualizations and models more efficiently, as workflows can be easily arranged side-by-side, reducing the need to scroll through numerous Python cells.

Fully managed, web-based platform: No more time wasted on configuring your environment or ensuring everyone is on the same page. Presenting and sharing your work has never been easier than with Einblick.

The question of the efficacy and relevance of generative AI in the data space is non-trivial, and it’s exciting to see a startup like Einblick addressing the issue in a meaningful and impactful manner. As a platform built by a data team for data teams, embedding generative AI into the platform simply extends what was already a platform focused on making data teams’ lives easier.

Beware Tech Giants Tacking on Generative AI

Although large companies like OpenAI, Google, GitHub, and Jupyter have been investing in new features to accommodate generative AI, for many, creating an efficient and impactful platform for data science and data analytics workflows is not the core of their business offerings. These companies have certainly created transformative products that we still use to this day, but users should still scrutinize the quality of their new releases.

In March 2023, Google Colab, Google’s web-based Python notebook, announced that AI coding functionality would be added to the notebooks, leveraging a family of Google’s code models, Codey. There has been little news other than previews from Google, but GitHub and Jupyter users can test out the benefits of AI-generated code now.

In fact, GitHub, the platform for version control, took its cloud-based AI tool, GitHub Copilot, out of technical preview just last year in June 2022. Copilot is available in popular code editors like Visual Studio Code and Neovim, providing users with automatic code completion. When using one of these code editors, you can prompt Copilot to offer suggestions by entering code comments, but even then, Copilot tends to generate code one line at a time. If Copilot is not offering quality suggestions, you’ll have to start writing out code manually to get it back on track. Keep in mind, however, that Copilot cannot infer any information about your data, so you’ll spend a lot of time editing any generated code for small things like casing, spelling, hyphens, and underscores. The product certainly seems geared more towards software engineers, given their traditional market, rather than data scientists and data teams. GitHub has announced a chat assistant feature, which is in closed public beta at the time of writing.

Similarly, Project Jupyter, the originator of the Python notebook, created Jupyter AI and Jupyternaut. The former is available anywhere running on an IPython kernel, allowing users to query large language models within Python notebooks to generate code using natural language prompts, as well as providing error syntax explanations. Jupyternaut is available only in JupyterLab and functions as a chatbot within the platform. Users can ask Jupyter AI to generate code using a specific model %%ai chatgpt and then supply a natural language prompt. Similar to Copilot, the lack of context awareness means that much of the generated code is boilerplate, so users will spend time manually editing the code generated. With Jupyter AI, you can also type in a much longer natural language query that provides context to the AI, but that can also be time-consuming. As a relatively new release, features like their error explanations are also a bit buggy but can help users who don’t want to switch between tabs constantly.

Special Use-Case: AI Charts for Everyone

Although applications of generative AI are still being explored, natural language interaction offers many opportunities for data teams and individuals to make parts of their work more efficient. One key example is the space of data visualization. Generative AI can simplify tasks such as adjusting color palettes and label formatting through verbal queries. Leveraging users’ instinct for natural language will save a lot of time compared to the manual configuration demanded by conventional tools. Additionally, rapid prototyping and experimentation are integral to data visualization, and with tools like Einblick Prompt, users can simply ask the AI to replicate and modify specific parts of a chart, accelerating the process and generating multiple versions swiftly.

Using traditional BI tools like Excel and Tableau circumvented the need to program, but fitting the flexibility of code into the complex grammar and syntax of hundreds of preset toggles, settings, and options was a challenge. As a result, users can get bogged down with the numerous steps and limited by the design of the tool. In contrast, AI-driven charting can translate verbalized descriptions into desired charts, offering an entryway to the flexibility of code

In fact, the makers of Einblick, seeing the opportunity in data visualization, recently launched ChartGen AI, a free, standalone app that allows users to go from text to chart in seconds. No account is required. Users just need to upload a dataset or link to a Google Sheet, type in their natural language prompt, and then see their data come to life. From scatter plots to histograms to pie charts and more, ChartGen AI can build anything the user describes.

Find Your Data’s Northstar

There are many exciting advancements in the field of generative AI. However, users must ensure that they are investing in the right tool for themselves and their teams. Not all tools have been designed from the outset for data tasks. Therefore, even though major companies were among the first to offer generative AI tools for coding, they may not necessarily be the best choice. The implementation of generative AI will be influenced by the larger company’s mission, audience, and purpose — all of which may be completely unrelated or, at best, tangential to the work of data teams. 

To fully harness the power of generative AI, users should consider alternative tools like Einblick. This is especially true when compared to large tech giants, whose primary focus may not have always been the betterment of data analysts and data scientists.

Contributed as part of AIM Branded Content. Know more here.

This article is contributed by
Shritama Saha

Shritama Saha

Shritama Saha is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

The Mahabharat of OpenAI

If the people associated with the recent OpenAI fiasco could be compared with the characters from epic Mahabharat, who would play whom?

NVIDIA Rides High on InfiniBands

“The vast majority of the dedicated large scale AI factories standardise on InfiniBand,” said Jensen Huang during NVIDIA’s Q3 earnings call

When Apple Meets OpenAI 

Prior to the introduction of ChatGPT’s voice assistant, Apple’s Siri held the position as the go-to tool for voice commands.