What’s Next in Generative AI?

OpenCV chief Satya Mallick believes we should look at AI as a collaborator instead of a competitor that humans can leverage to be more productive
Listen to this story

Generative AI is like having a personal creative genius by your side. With its remarkable ability to analyze patterns and develop new content based on them, generative AI can create everything from stunning digital art to original music compositions, human-like text and much more.

However, the really cool space of generative AI brings with it the complex issue of piracy and copyright infringement in AI art. Despite that, the past two years have seen phenomenal growth in the segment. 

In an exclusive interview, Dr Satya Mallick, the chief executive officer at OpenCV, told Analytics India Magazine that he believes that the biggest breakthrough in generative AI is the development of large language models or foundation models, noting that transformer models, such as those used in vision transformers, are a significant innovation in this area.

According to Mallick, the next in store for generative AI is multiple inputs and multimedia output. In other words, a multimodal approach

Microsoft recently introduced a multimodal large language model (MLLM) called Kosmos-1. AI research studio Alethea.AI unveiled CharacterGPT, which generates characters from text. Two years ago, Google AI also released the MURAL: Multimodal, Multitask Representations Across Languages model for image-text matching. It deploys multitask learning applied to image–text pairs in combination with translation pairs covering over 100 languages.

However, Mallick said, “It comes with the two fundamental limitations, including how much data can be obtained – whether there is a way to avoid the need for annotating data and the dearth of computational power – although it is expected to increase in the future”.

Mallick, an IIT-Kharagpur alumnus, is also the founder of California-based computer vision company Big Vision. Back in 2006, when no one really knew about AI or its immense potential, Mallick had co-founded TAAZ – a computer vision company that created vision and learning solutions for the beauty and fashion industry.

OpenCV, an open-source computer vision and machine learning software library, was founded by Intel in 1999. Gray Bradsky, a former computer vision engineer at Intel developed it with a team of engineers, predominantly from Russia. He developed the first iterations of OpenCV while working at Intel. In 2002, they released version 0.9 of the software as open-source. 

The company recently launched two new courses as a part of their ‘Kickstarter campaign’ on how to efficiently generate art with AI. The first course, ‘AI Art Generation for Everyone‘ does not require any background in AI or programming, while the second course, ‘Advanced AI Art Generation’, requires basic programming knowledge. 

AI-generated art has the power to revolutionize the art world and unearth unexplored possibilities. However, it also introduces a complex challenge of piracy and copyright infringement, raising concerns around ownership and intellectual property.

Recently, image generation platforms like Midjourney and Stability AI were sued for using artists’ works to train their generative AI algorithms, infuriating the artist community. Meanwhile, Shutterstock has taken a more responsible stance by introducing its own AI tool, in contrast to Getty Images, which has forbids the usage of its photos in generative AI artwork.

Dr Mallick drew the parallels between YouTube’s early years and the current situation with threats of copyright. He said that a similar solution to YouTube’s, with a big company like Google coming into the picture, negotiating deals and paying copyright holders, could work here. 

ChatGPT vs DALL.E 

OpenAI’s popular chatbot, ChatGPT, has become a household name by garnering over 100 million users in less than three months. As of February 2023, ChatGPT gets over 25 million daily visits. But there is a stark difference in the adoption rate of text-to-image models like OpenAI’s DALL-E or StabilityAI’s Stable Diffusion when compared to ChatGPT.

Mallick explained that one of the main reasons why ChatGPT has such a high adoption rate is because writing is a primary skill needed in every job, whether you are a coder, author or social media manager. Even Coca-Cola is employing generative AI for marketing with the help of OpenAI and Bain & Company. 

“The three primary skills taught at the elementary school are – reading, writing, and arithmetic, not art or photography as these are high-level skills. Plus, training an NLP model on text is easier as it is less computationally intensive than image data.”

Besides, generative AI is consolidating and becoming more sophisticated as researchers combine different techniques and approaches. By leveraging the strengths of NLP and computer vision, Stable Diffusion models represent a significant step forward in generative AI. 

Traditional generative models like generative adversarial networks (GANs) were limited in their ability to understand the world because they lacked a notion of language. While GANs could create realistic-looking images, they needed to be trained with specific data sets, such as images of human faces or cats. 

In contrast, stable diffusion models leverage the knowledge gained from text data to understand how words cluster together and relate to the world. This allows them to generate more complex and varied images without relying on specific data sets.

“Stable diffusion models are a significant advancement in generative AI, precisely because they do not rely on supervised learning. By leveraging the knowledge gained from unsupervised learning, these models can generate complex and varied images without requiring the manual labeling of data, making it more flexible,” he said. 

Download our Mobile App

Shritama Saha
Shritama is a technology journalist who is keen to learn about AI and analytics play. A graduate in mass communication, she is passionate to explore the influence of data science on fashion, drug development, films, and art.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it