Listen to this story
Generative AI is like having a personal creative genius by your side. With its remarkable ability to analyze patterns and develop new content based on them, generative AI can create everything from stunning digital art to original music compositions, human-like text and much more.
However, the really cool space of generative AI brings with it the complex issue of piracy and copyright infringement in AI art. Despite that, the past two years have seen phenomenal growth in the segment.
In an exclusive interview, Dr Satya Mallick, the chief executive officer at OpenCV, told Analytics India Magazine that he believes that the biggest breakthrough in generative AI is the development of large language models or foundation models, noting that transformer models, such as those used in vision transformers, are a significant innovation in this area.
According to Mallick, the next in store for generative AI is multiple inputs and multimedia output. In other words, a multimodal approach.
Microsoft recently introduced a multimodal large language model (MLLM) called Kosmos-1. AI research studio Alethea.AI unveiled CharacterGPT, which generates characters from text. Two years ago, Google AI also released the MURAL: Multimodal, Multitask Representations Across Languages model for image-text matching. It deploys multitask learning applied to image–text pairs in combination with translation pairs covering over 100 languages.
However, Mallick said, “It comes with the two fundamental limitations, including how much data can be obtained – whether there is a way to avoid the need for annotating data and the dearth of computational power – although it is expected to increase in the future”.
Mallick, an IIT-Kharagpur alumnus, is also the founder of California-based computer vision company Big Vision. Back in 2006, when no one really knew about AI or its immense potential, Mallick had co-founded TAAZ – a computer vision company that created vision and learning solutions for the beauty and fashion industry.
OpenCV, an open-source computer vision and machine learning software library, was founded by Intel in 1999. Gray Bradsky, a former computer vision engineer at Intel developed it with a team of engineers, predominantly from Russia. He developed the first iterations of OpenCV while working at Intel. In 2002, they released version 0.9 of the software as open-source.
The company recently launched two new courses as a part of their ‘Kickstarter campaign’ on how to efficiently generate art with AI. The first course, ‘AI Art Generation for Everyone‘ does not require any background in AI or programming, while the second course, ‘Advanced AI Art Generation’, requires basic programming knowledge.
Copyright & IP Concerns
AI-generated art has the power to revolutionize the art world and unearth unexplored possibilities. However, it also introduces a complex challenge of piracy and copyright infringement, raising concerns around ownership and intellectual property.
Recently, image generation platforms like Midjourney and Stability AI were sued for using artists’ works to train their generative AI algorithms, infuriating the artist community. Meanwhile, Shutterstock has taken a more responsible stance by introducing its own AI tool, in contrast to Getty Images, which has forbids the usage of its photos in generative AI artwork.
Dr Mallick drew the parallels between YouTube’s early years and the current situation with threats of copyright. He said that a similar solution to YouTube’s, with a big company like Google coming into the picture, negotiating deals and paying copyright holders, could work here.
ChatGPT vs DALL.E
OpenAI’s popular chatbot, ChatGPT, has become a household name by garnering over 100 million users in less than three months. As of February 2023, ChatGPT gets over 25 million daily visits. But there is a stark difference in the adoption rate of text-to-image models like OpenAI’s DALL-E or StabilityAI’s Stable Diffusion when compared to ChatGPT.
Mallick explained that one of the main reasons why ChatGPT has such a high adoption rate is because writing is a primary skill needed in every job, whether you are a coder, author or social media manager. Even Coca-Cola is employing generative AI for marketing with the help of OpenAI and Bain & Company.
“The three primary skills taught at the elementary school are – reading, writing, and arithmetic, not art or photography as these are high-level skills. Plus, training an NLP model on text is easier as it is less computationally intensive than image data.”
Besides, generative AI is consolidating and becoming more sophisticated as researchers combine different techniques and approaches. By leveraging the strengths of NLP and computer vision, Stable Diffusion models represent a significant step forward in generative AI.
Traditional generative models like generative adversarial networks (GANs) were limited in their ability to understand the world because they lacked a notion of language. While GANs could create realistic-looking images, they needed to be trained with specific data sets, such as images of human faces or cats.
In contrast, stable diffusion models leverage the knowledge gained from text data to understand how words cluster together and relate to the world. This allows them to generate more complex and varied images without relying on specific data sets.
“Stable diffusion models are a significant advancement in generative AI, precisely because they do not rely on supervised learning. By leveraging the knowledge gained from unsupervised learning, these models can generate complex and varied images without requiring the manual labeling of data, making it more flexible,” he said.