MITB Banner

What’s Next in Generative AI?

OpenCV chief Satya Mallick believes we should look at AI as a collaborator instead of a competitor that humans can leverage to be more productive

Share

Listen to this story

Generative AI is like having a personal creative genius by your side. With its remarkable ability to analyze patterns and develop new content based on them, generative AI can create everything from stunning digital art to original music compositions, human-like text and much more.

However, the really cool space of generative AI brings with it the complex issue of piracy and copyright infringement in AI art. Despite that, the past two years have seen phenomenal growth in the segment. 

In an exclusive interview, Dr Satya Mallick, the chief executive officer at OpenCV, told Analytics India Magazine that he believes that the biggest breakthrough in generative AI is the development of large language models or foundation models, noting that transformer models, such as those used in vision transformers, are a significant innovation in this area.

According to Mallick, the next in store for generative AI is multiple inputs and multimedia output. In other words, a multimodal approach

Microsoft recently introduced a multimodal large language model (MLLM) called Kosmos-1. AI research studio Alethea.AI unveiled CharacterGPT, which generates characters from text. Two years ago, Google AI also released the MURAL: Multimodal, Multitask Representations Across Languages model for image-text matching. It deploys multitask learning applied to image–text pairs in combination with translation pairs covering over 100 languages.

However, Mallick said, “It comes with the two fundamental limitations, including how much data can be obtained – whether there is a way to avoid the need for annotating data and the dearth of computational power – although it is expected to increase in the future”.

Mallick, an IIT-Kharagpur alumnus, is also the founder of California-based computer vision company Big Vision. Back in 2006, when no one really knew about AI or its immense potential, Mallick had co-founded TAAZ – a computer vision company that created vision and learning solutions for the beauty and fashion industry.

OpenCV, an open-source computer vision and machine learning software library, was founded by Intel in 1999. Gray Bradsky, a former computer vision engineer at Intel developed it with a team of engineers, predominantly from Russia. He developed the first iterations of OpenCV while working at Intel. In 2002, they released version 0.9 of the software as open-source. 

The company recently launched two new courses as a part of their ‘Kickstarter campaign’ on how to efficiently generate art with AI. The first course, ‘AI Art Generation for Everyone‘ does not require any background in AI or programming, while the second course, ‘Advanced AI Art Generation’, requires basic programming knowledge. 

AI-generated art has the power to revolutionize the art world and unearth unexplored possibilities. However, it also introduces a complex challenge of piracy and copyright infringement, raising concerns around ownership and intellectual property.

Recently, image generation platforms like Midjourney and Stability AI were sued for using artists’ works to train their generative AI algorithms, infuriating the artist community. Meanwhile, Shutterstock has taken a more responsible stance by introducing its own AI tool, in contrast to Getty Images, which has forbids the usage of its photos in generative AI artwork.

Dr Mallick drew the parallels between YouTube’s early years and the current situation with threats of copyright. He said that a similar solution to YouTube’s, with a big company like Google coming into the picture, negotiating deals and paying copyright holders, could work here. 

ChatGPT vs DALL.E 

OpenAI’s popular chatbot, ChatGPT, has become a household name by garnering over 100 million users in less than three months. As of February 2023, ChatGPT gets over 25 million daily visits. But there is a stark difference in the adoption rate of text-to-image models like OpenAI’s DALL-E or StabilityAI’s Stable Diffusion when compared to ChatGPT.

Mallick explained that one of the main reasons why ChatGPT has such a high adoption rate is because writing is a primary skill needed in every job, whether you are a coder, author or social media manager. Even Coca-Cola is employing generative AI for marketing with the help of OpenAI and Bain & Company. 

“The three primary skills taught at the elementary school are – reading, writing, and arithmetic, not art or photography as these are high-level skills. Plus, training an NLP model on text is easier as it is less computationally intensive than image data.”

Besides, generative AI is consolidating and becoming more sophisticated as researchers combine different techniques and approaches. By leveraging the strengths of NLP and computer vision, Stable Diffusion models represent a significant step forward in generative AI. 

Traditional generative models like generative adversarial networks (GANs) were limited in their ability to understand the world because they lacked a notion of language. While GANs could create realistic-looking images, they needed to be trained with specific data sets, such as images of human faces or cats. 

In contrast, stable diffusion models leverage the knowledge gained from text data to understand how words cluster together and relate to the world. This allows them to generate more complex and varied images without relying on specific data sets.

“Stable diffusion models are a significant advancement in generative AI, precisely because they do not rely on supervised learning. By leveraging the knowledge gained from unsupervised learning, these models can generate complex and varied images without requiring the manual labeling of data, making it more flexible,” he said. 

Share
Picture of Shritama Saha

Shritama Saha

Shritama (she/her) is a technology journalist at AIM who is passionate to explore the influence of AI on different domains including fashion, healthcare and banks.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.