Listen to this story
|
YouTuber and DeepJudge CTO Yannic Kilchner created an AI chatbot called ‘GPT-4chan’. The bot was trained on three years’ worth of posts from 4chan, the repulsive cousin of Reddit.
Kilchner fed the bot threads from the Politically Incorrect /pol/ board, a 4chan message board notorious for racist, xenophobic, and hateful content. The bot sparked a heated debate on social media before it went offline.
THE BELAMY
Sign up for your weekly dose of what's up in emerging technology.
This is the worst AI ever! I trained a language model on 4chan's /pol/ board and the result is…. more truthful than GPT-3?! See how my bot anonymously posted over 30k posts on 4chan and try it yourself. Watch here (warning: may be offensive):https://t.co/lihsaYAm7l pic.twitter.com/xs7rgtucQb
— Yannic Kilcher, Tech Sister (@ykilcher) June 3, 2022
Recently, the AI community launched a petition ‘Condemning the deployment of GPT-4chan.’ The petition stated: “Unfortunately, we, the AI community, currently lack community norms around their responsible development and deployment. Nonetheless, it is essential for members of the AI community to condemn clearly irresponsible practices.”
There are legitimate and scientifically valuable reasons to train a language model on toxic text, but the deployment of GPT-4chan lacks them. AI researchers: please look at this statement and see what you think: https://t.co/JbxXl6Fld5
— Percy Liang (@percyliang) June 21, 2022
GPT-4chan is a large language model trained on approximately 134.5 million posts from the Politically Incorrect /pol/ anonymous message board. Kilchner developed the model by fine-tuning GPT-J with a previously published dataset to mimic the users of 4chan’s board. He posted about GPT-4chan on his YouTube channel and called it the ‘Worst AI ever.’
“The model was good, in a terrible sense … It perfectly encapsulated the mix of offensiveness, nihilism, trolling, and deep distrust of any information whatsoever that permeates most posts on /pol/.”
He claimed his model is more truthful than any other GPT model out there.
Kilchner said the bot has posted around 30,000 times on 4chan before being taken down and posted more than 1,500 times in a span of 24 hours.
The model was downloaded over 1,400 times, and the links were available on Twitter, Hacker News, and Reddit. Kilchner even created a website (not accessible anymore). He also published the codes on Github. He said the idea to create GPT-4chan came to him after Elon Musk claimed that the number of bots on Twitter is much higher than the official number (5 percent).
Condemning GPT-4chan
The model was hosted on Hugging Face as well.
This week an #AI model was released on @huggingface that produces harmful + discriminatory text and has already posted over 30k vile comments online (says it's author).
— Lauren (the girl with the pigeons) (@DrLaurenOR) June 6, 2022
This experiment would never pass a human research #ethics board. Here are my recommendations.
1/7 https://t.co/tJCegPcFan pic.twitter.com/Mj7WEy2qHl
Initially, Hugging Face limited access to the model before removing access to the model altogether. “Hugging Face as the model custodian (an interesting new concept) should implement an ethics review process to determine the harm hosted models may cause, and gate harmful models behind approval/usage agreements. Open science and software are wonderful principles but must be balanced against potential harm. Medical research has a strong ethics culture because we have an awful history of causing harm to people, usually from disempowered groups,” said AI safety researcher Dr Lauren Oakden-Rayner.
So far, the petition has been signed by more than 200 members of the community, including Yoshua Bengio, Full professor at Université de Montréal, Sam Bowman, Assistant Professor, NYU and Jonathan Berant, Associate Professor, Tel Aviv University.
“Yannic Kilcher’s deployment of GPT-4chan is a clear example of irresponsible practice. GPT-4chan is a language model that Kilcher trained on over three million 4chan threads from the Politically Incorrect /pol/ board, a community full of racist, sexist, xenophobic, and hateful speech that has been linked to white-supremacist violence such as the Buffalo shooting last month,” the petition said.
However, not everyone is on board with the petition. Dustin Tran, Senior Research Scientist at Google Brain, said, “I’m against GPT-4chan’s unrestricted deployment. However, a condemnation letter against a single independent researcher smells of unnecessary pitchfork behaviour. Surely there are more civil and actionable approaches.”
Two sides to a story
Much of the debate on social media with GPT-4chan has been about how harmful models like GPT-4chan can wreak havoc. The biggest concern was that GPT-4chan could pave the way for more AI bots being developed to spread racist and hateful messages online without any human intervention.
Secondly, the model could target vulnerable people with harmful messages that could lead to self-harm. Also, models such as GPT-4chan could be weaponised to spread misinformation. However, Kilchner has defended his model on social media claiming there were no documented incidents of GPT-4chan causing harm to anybody.
I asked this person twice already for an actual, concrete instance of "harm" caused by gpt-4chan, or even a likely one that couldn't be done by e.g. gpt-2 or gpt-j (or a regex for that matter), but I'm being elegantly ignored 🙃 https://t.co/Eqpvg8Xl1p
— Yannic Kilcher, Tech Sister (@ykilcher) June 6, 2022
That said, there are two sides to a story. A section of social media came to Kilchner’s defence, arguing the model is not inherently harmful and could be used for good. For example, the model could be leveraged to combat hate speech.
tbh the entire GPT-4chan incident and the way AI researchers are getting harassed over their disagreements with it discourages me from producing fun AI content on YouTube because I do not want to cultivate that type of fanbase ever
— Max Woolf (@minimaxir) June 21, 2022