Listen to this story
Today, data science teams world over are leveraging generative AI to reap the benefits of the technology that has everyone’s rapt attention. Not one to miss out on the transition, Yuvaneet Bhaker, principal data scientist at Fractal, told AIM that his team too has been leveraging generative AI to stay ahead of the curve.
Bhaker agrees that generative AI can enable direct and indirect applications to work with structured data. Popular direct applications include leveraging natural language for structured database queries and generating various data types, such as tabular, hierarchical, graph, and time-series data. The utility of LLM embeddings is paramount, aiding in classification tasks and detecting rare events like anomalies, fraud, and piracy within datasets.
Indirect applications, on the other hand, facilitate data scientist workflow planning and enhancement. Generative AI offers the ability to produce recommendations, suggesting which features to implement, along with generating implementations (co-pilots) and creating documentation to support these processes.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
In this exclusive interaction with Analytics India Magazine, Bhaker contrasts generative AI applications on structured versus unstructured data and delves into how Fractal’s data science team has been leveraging the technology.
Can you tell us how the recent generative AI trend has impacted Fractal?
Bhaker: Fractal has been actively engaged in generative AI. This has accelerated over the past year, we’ve witnessed significant transformations in this field. The surge in generative AI’s capabilities has spurred increased investment and exploration, particularly on behalf of our clients. We are developing tools for our client’s that can boost productivity for internal users and enhance customer experiences.
Fractal provides thought leadership and guidance on generative AI. We help our clients with discovery and implementation of use-cases that can make a real impact. We are developing products, accelerators and best practices of working with generative AI solutions.
Our primary research focus is on enhancing the reliability, explainability, and readiness of GenAI solutions. We blend design, engineering, and AI, catering to end-users’ needs while emphasising speed, reliability, and accuracy. Incorporating domain expertise is another crucial element in this journey.
How is the data science team at Fractal leveraging generative AI? What are the current use cases? Also, which LLMs are you currently using?
Bhaker: Our generative AI team works across multimodal use cases encompassing text, images, videos, structured and unstructured data. We incorporate design and behavioral science concepts to ensure our outputs have a meaningful impact on customers.
One of our primary use cases involves data querying, whether the data is structured or unstructured. This entails searching for relevant information and generating insights or summaries in a contextually useful format. The format varies depending on the end-user, such as a relationship manager or a risk team.
Internally, our data science team utilises generative AI for enhanced productivity, from utilising copilot tools for structured data insights to code translation between programming languages, showcasing the diverse range of applications for such solutions.
We are developing frameworks agnostic to specific LLM, making it easier to upgrade or switch as technology improves. Simultaneously we are partnering with LLM providers to configure each implementation to extract best possible performance. Data security and performance are key criteria for us in this regard.
What are the key differences between using generative AI for structured data compared to unstructured data, and what unique challenges does structured data present?
Bhaker: LLMs can process both structured and unstructured data in part or as a whole. It is possible for LLMs to extract information from unstructured data and shape it in desired format.
High quality structured data simplifies insight generation, pattern discovery, and prediction making. It enables use of data processing and insight generation tools like SQL and python.
There are some challenges in working with structured data, such as granularity, missing values, and dimensionality. However, LLMs have shown greater flexibility in addressing some of these challenges. Generative AI solutions offer the potential to elevate the caliber of existing data while embracing the idiosyncrasies inherent in data with ambiguous formats, all the while maintaining performance integrity.
Historically differences manifested through separate use cases (for example, translation vs forecasting), data processing techniques (tokenisation vs feature engineering), training (transfer learning vs building models every time from scratch) as well as evaluation methodologies.
But, now generative AI empowers us to approach problems with a multimodal perspective, seamlessly incorporating both structured and unstructured data. These models draw upon extensive open-source data and possess a comprehensive ‘worldview’ infused with knowledge. For instance, consider the category ‘credit card’ in structured data. Previously, it was typically represented as a one-hot encoded variable, lacking a nuanced understanding. However, generative AI comprehends the concept of a credit card, bridging the gap between structured data and real-world knowledge.
Going forward, the lines between structured and unstructured problems will become increasingly blurred.
What are the challenges and learnings when it comes to implementing generative AI for structured data?
Bhaker: When we initially implemented these solutions, we noticed that they possessed the ability to generate responses that appeared quite plausible and appealing to human readers. However, when domain experts, such as physicists, programmers, mathematicians, or corporate banking specialists, posed questions within their respective fields, it became apparent that while the responses seemed plausible, they weren’t always accurate. This posed a significant risk, as relying on these responses for decision-making could lead to undesired consequences.
Furthermore, most user interactions with generative AI occur in a chatbot-like fashion at the end of the entire process. However, our goal was to employ generative AI more upstream to automate various tasks, including generating complete blocks of code or insight automation. This introduced additional challenges. Our primary learning from this experience has been centered around enhancing the critique and validation capability of our solution.
We aim to incorporate reasoning abilities to help it understand its limits and clearly communicate uncertainties. We want it to be able to say, ‘I think this is the answer, but I’m not entirely certain’. By verifying the compatibility of queries with available information, we aim to reduce the need for speculative or inaccurate responses. Our ongoing efforts are focused on experimenting with each entity in our solution to enhance its robustness and scalability.
Are there any ethical considerations or potential risks associated with the use of generative AI for structured data?
Bhaker: Generative AI poses a risk of producing deceptive outputs with a high level of confidence, potentially misleading by generating seemingly authentic content. Addressing these concerns would require a transparent approach. Proper attribution, citation along with independent and competent criticism and adversarial Generative AI solutions can be useful. Human domain experts will play a critical role in developing such evaluation and benchmarking capabilities.
Data bias and fairness are inherited from training data emphasis on diverse datasets, and balancing may help. Privacy concerns arise due to sensitive PII data in training as well as querying stages, such info must be carefully masked and removed. Security risk can be addressed by ensuring proper access control. Generated data can be potentially misused, so it is important to identify, tag, and increase awareness to prevent unwanted consequences.
What are some of the challenges for Fractal at large, when it comes to leveraging generative AI at scale?
Bhaker: Initial challenges include cost and rate limitation of LLM APIs. For the right reasons I am hoping, they will become cheaper, faster and better incrementally.
The second challenge revolves around evaluating these solutions, particularly concerning specific tasks. It’s crucial to ensure the reliability of responses. We’ve observed that numerous teams, each specialising in different tasks, can benchmark and validate the responses, enhancing their dependability.
The third challenge pertains to infrastructure, including the expensive GPUs and cloud infrastructure required to host LLMs. At Fractal, we’ve adopted a platform approach. Dedicated teams work on optimising the performance of hosted LLMs, making them available within our ecosystem. This approach leverages expertise in effectively working with these extensive models.
Our approach involves various teams working at different stages — optimising responses, managing infrastructure and APIs, creating a platform for downstream applications, and developing solutions to enhance productivity for our clients. Additionally, our design, behavioral sciences, and domain expert teams contribute their knowledge to make these applications more valuable to our clients.