Listen to this story
Last month, a group of Cosmopolitan editors, alongside digital artist Karen X. Cheng and members of artificial intelligence research lab OpenAI, created the first-ever magazine cover designed by artificial intelligence. This is the first-ever magazine cover generated using DALLE-2.
Recently, OpenAI’s GPT-3 also published a research thesis on itself. It is listed as one of the paper’s main authors – ‘Gpt Generative Pretrained Transformer,’ besides Almira Osmanovic Thunström and Steinn Steingrimsson.
In the past, there have been several instances where GPT has been able to create human-like text. It has written news articles and poems, produced books in 24 hours, created new content from deceased authors, and even wrote like Chetan Bhagat, a famous Indian author.
At the onset, these things look quite intriguing. However, it calls for clarity around its credibility and possible bypassing of restrictions on commercial use of the work on which OpenAI’s DALLE-2 and GPT-3 may be trained on.
It also brings us to ask a bigger question as to where is ‘I’ in AI anymore? Should GPT-3 or DALLE-2 be given that much credit if it is humans who have been doing all the thinking (giving prompts), alongside the issues around compositionality, biases, and others? Where do we draw the line?
Cheng said that there was a ton of human involvement and decision-making. “While each attempt takes only 20 seconds to generate, it took hundreds of attempts. Hours and hours of prompts generating and refining before getting the perfect image,” she added.
She said that the natural reaction is to fear that AI will replace human artists, a thought that crossed her mind as well. However, working with DALL.E removed all such doubts. She said that instead of a replacement, DALL.E comes across as an ‘instrument to play’ for humans.
She likened it to learning a musical instrument – you will improve with practice. Cheng claims that she spent over 100 hours ‘playing’ with the tool; she is now adept at recognising the correct keywords to generate a specific image. She also said that she has been conversing with DALL-E artists on Twitter/Discord. “I learned from other artists that you could ask for specific camera angles. We are all figuring it out together how to play this beautiful new instrument,” she added.
Not as smart as you think
AI leaders seem to agree, where they said DALL-E isn’t nearly as smart as you seem to think. Citing Meta’s work on Aversarial NLI (2019), Gary Marcus and Elliot Murphy, in their latest blog post, said that inadequate attention to three factors – namely, reference, cognitive model, and compositionality – has serious consequences.
- Large language models tend to lose coherence over time, drifting into ’empty’ language with no clear connection to reality
- The difficulty of LLM in distinguishing truth from falsehoods
- The struggle to avoid perpetuating bias and toxic speech
The duo believes that none of these three issues has been solved, referring to (19th-century) Gottlob Frege’s work. For example, there is still debate about how much of our everyday language use relies on compositionality and what the right cognitive models of language should be. They added that linguistics has a lot to offer in terms of formulating and thinking about these questions.
Marcus and Murphy said that compositionally has long been a central concept in linguistics and philosophy, yet so-called foundation models – including GPT-3, BERT, etc. – sidestep it. Furthermore, they said compositionality is not the same as what a photo editor might call composting.
They said when DALL-E is given a prompt for generating an image with a blue cube on top of a red cube, the tool puts those words together but shows a certain degree of blindness to the parts. For instance, it may produce an image with both a blue cube and a red cube but may place the red one above the other cube.
This means that while system combines the elements, adding them to the output image, it loses the compositionality that captures the relation between those elements.
It is fascinating to see machine learning models like GPT-3 and DALLE-2 gaining immense popularity with emerging use cases and applications. However, there is still a long way to go as to how these things unfold, where it not only addresses all the factors around compositionality, eliminating biases, and others, but also clarity around its commercial usage.