Verisimilitude: The AI storm is already here for moderators

Like the startling clap that follows distant rumbles of thunder, artificial intelligence (AI) has arrived. Stunning images and precocious prose can be generated at the behest of anyone. You need only download Diffusion Bee or create a free account at chat.openai.com to toy with these marvels yourself.

And anyone who does so wouldn’t be surprised by the many headlines warning that “AI’s threat to jobs and human happiness is real” (Strickland 2022). The thunderclap can be so startling, that in July 22, a Google engineer thought that machines had finally become sentient — and he was fired for publicizing his belief (Lemoine 2022). At the outset of 2023, anyone can experiment and draw their own conclusions.

While this technology might, some day soon, be disruptive to artists, journalists, authors, teachers, copywriters, photographers, illustrators, and others, it is already a problem for some: online moderators. At Reddit, Wikipedia, and Stack Overflow, contributors pride themselves on producing content that is, in their lights, correct. These are epistemic communities, whose members share an understanding of what constitutes quality contribution, how it is made, and how to reward its authors (Tzouris 2002, 21).

Reddit and Stack Overflow, especially, gamify the creation of quality content via “karma” and “reputation.” Such voting is intrinsic to the platforms’ curation, and members share more substantive and personal recognition by way of flair and awards. On r/AskHistorians users upvote questions and answers; additionally, great questions are labeled as such; flairs are given by moderators to design areas of expertise based on previous contributions; and fellow users can give awards including “helpful,” “awesome answer,” and “wholesome.” Questions and answers are similarly voted on at Stack Overflow; open questions can have bounties, and flairs include gold, silver, and bronze badges. At Wikipedia, voting and ranking is less explicit; nonetheless, some dedicated Wikipedians keep a public tally of their edit count, the quality assessments of their articles, and the awards given to them by their peers.

The challenge for epistemic communities in the face of AI is that of verisimilitude. The latest bots can produce content that looks good but substantively fails — even if the substance is spot on most of the time. When I asked ChatGPT about this concept, it responded that “verisimilitude does not necessarily have to be based on actual truth or reality. Rather, it refers to the appearance of being true or real, or the extent to which a work of fiction seems believable or convincing” (ChatGPT 2022). This aligns with other definitions, including Wiktionary’s user-edited definition: “1. The property of seeming true, of resembling reality; resemblance to reality, realism. 2. A statement which merely appears to be true” (“Verisimilitude” 2022).

How do I know, however, that someone didn’t use ChatGPT to create the Wiktionary definition? In this case, both services’ definitions align with more authoritative references, but this is the challenge for epistemic communities. When AI can produce verisimilitude — in response to requests for advice, to questions about history, or queries about dunder methods in Python — what ought these communities do? Traditionally, there’s been a rough correlation between the quality of content and its polish. Poor quality content can now, nonetheless, evince a brilliant sheen.

Unlike the artists who object to their work being used to train AIs, or the illustrators who fear that their jobs are threatened, or the teachers who worry cheating will be harder to detect, moderators are complaining that their communities are seeing polished content that appears accurate but is not. Two years ago, u/pianobutter on r/TheoryOfReddit anticipated “The New Generation of Spam Bots are Coming.” Their thread, on a subreddit dedicated to musing about Reddit itself, began: “Reddit is about to become a battleground. A test site for a new age of social media. Perhaps even civilization. Things are going to get weird.” They believed that “Transformer models + Reinforcement Learning” would replace “human astroturfers and trolls.” (ChatGPT uses these techniques; it is a Generative Pre-trained Transformer chatbot.) Most frighteningly, “Downvoting them won’t help: by downvoting them you are just training them, making them better. Ignoring them is no good either. They will participate and their influence will inevitably increase.” Reddit would be the front-line, and the bots would usher in a “a new era of propaganda” (pianobutter 2020).

Not surprisingly, r/CryptoCurrency was one of the first to be hit with GPT-style bots. The bots were using GPT-NEO, an inexpensive yet “extremely powerful neutral langue network,” to farm karma (i_have_chosen_a_name 2021). One can imagine a network of high-karma bots running pump-and-dump cryptocurrency schemes. More recently, a concerned Redditor posted to r/ModSupport that r/nostupidquestions is seeing an “increasing large number of GPT style answer bots which provide nonsensical but reasonable sounding responses” (Petwins 2022). The karma these bots accumulate could be used to boost propaganda, for example. In December, another Redditor noted the problem had spread to other subreddits, including r/AskReddit, and the mods had banned over a thousand GPT bots in that week alone; the moderators were struggling. Even r/houseplants discovered that a well-regarded embroidery of a leaf posted to their sub turned out to be an AI creation (VVHYY 2022).

Over at StackOverflow, where you can find questions and answers about artificial intelligence algorithms, ChatGPT answers were banned soon after the tool’s release: though “the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce” (Makyen 2022). A ban, of course, is only observed by honest contributors. Dishonest contributors who are seeking to “reputation farm” need to somehow be moderated — perhaps by limiting an account’s ability to answer a given number of questions in a day.

Even so, you need merely ask ChatGPT to “write me a Wikipedia article with five cited sources” and it appears to do so — even if some of the sources don’t, in fact, exist. An astronaut riding a horse (Monet) 2022-08-30 Because Wikipedia lacks explicit voting, karma farming is not so rife. When Wikipedian Ian Watt shared his resulting example, another longtime contributor, Andrew Lih, reviewed it relative to the distinctions between data, information, knowledge, and wisdom: “Vanilla GPT produces plausible data, prose that esoterically resembles information, passable but inconsistent knowledge for certain verticals, and most definitely not wisdom. The worry comes when the bad and good are commingled and indistinguishable from one another.” Amusing, he also noted that when he asked ChatGPT how to upload media to Wikipedia, “it’s answer was clearer than most of our on-wiki documentation. I’m not sure if that’s a compliment to the AI, or an indictment of our documentation” (Lih 2022). Whether uploading AI-generated images is acceptable has been a topic of discussion at Wikimedia Commons for the past two years (Owlsmcgee 2019; RAN 2021). The most recent discussion was accompanied by an image whose caption spoke to one opinion on the copyright of the resulting image: “‘An astronaut riding a horse, in the style of Monet’. Monet did not paint this image, and even if he were alive today, he is not the copyright holder of this work simply because of the brushstroke patterns” (Arkesteijn 2022). There are many other issues implicated as well.

As the many headlines indicate, the widespread availability of stable diffusion and transformer AI has far-reaching implications within the near future. But people at Reddit, Stack Overflow, and Wikipedia are grappling with those implications today. And many of us will soon be grappling with the meaning of verisimilitude in the digital age as it is used to infiltrate the epistemic communities we rely upon in a world already struggling with misinformation.

(Thanks to Sarah Ann Gilbert for discussing this with me.)

References

Arkesteijn, Jan. 2022. “Commons:Village pump/Archive/2022/10.” Wikimedia Commons. October 21, 2022. https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2022/10#AI-generated_works.
ChatGPT. 2022. “A Chat about Verisimilitude.” OpenAI. December 15, 2022. https://chat.openai.com/chat.
i_have_chosen_a_name. 2021. “/r/cryptocurrency Is Being Run over by GPT-NEO Bots. Every Single Topic You Make, Not Matter What It Is about Will Instantly Have 5 -10 Comments Made by Bots. A Good 40% of New Comments Made Here Are Made by Bots.” r/CryptoCurrency. https://www.reddit.com/r/CryptoCurrency/comments/p8m0ik/rcryptocurrency_is_being_run_over_by_gptneo_bots/.
Lemoine, Blake. 2022. “Is LaMDA Sentient? — an Interview by Blake Lemoine.” Medium (blog). June 11, 2022. https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917.
Lih, Andrew. 2022. “Now That Said….” Mastodon. https://wikis.world/@fuzheado/109467318402404985.
Makyen. 2022. “Temporary Policy: ChatGPT Is Banned.” Meta Stack Overflow. December 8, 2022. https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned.
Owlsmcgee. 2019. “Commons:Village pump/Archive/2019/09.” Wikimedia Commons. September 23, 2019. https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2019/09#Policies_around_images_created_by_Artificial_Intelligence_applications_(such_as_GANs)?h.
Petwins. 2022. “We Are Noticing a Sharp Influx of Bots in Nostupidquestions.” r/ModSupport. https://www.reddit.com/r/ModSupport/comments/xtmzx0/we_are_noticing_a_sharp_influx_of_bots_in/.
pianobutter. 2020. “The New Generation of Spam Bots Are Coming: Where Do We Go from Here?” r/TheoryOfReddit. https://www.reddit.com/r/TheoryOfReddit/comments/i7yd7m/the_new_generation_of_spam_bots_are_coming_where/.
RAN. 2021. “Commons:Village pump/Archive/2021/02.” Wikimedia Commons. February 26, 2021. https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2021/02#What_is_the_Wikimedia_Commons_position_on_storing_AI_enhanced_historic_images.
Strickland, Eliza. 2022. “AI’s Threats to Jobs and Human Happiness Are Real.” IEEE Spectrum, May 12, 2022. https://spectrum.ieee.org/kai-fu-lee-ai-jobs.
Tzouris, Menelaos. 2002. “Software Freedom, Open Software and the Participant’s Motivation - a Multidisciplinary Study.” In M.Sc. Thesis. London School of Economics and Political Science. http://opensource.mit.edu/papers/tzouris.pdf.
“Verisimilitude.” 2022. Wiktionary. December 10, 2022. https://en.wiktionary.org/wiki/verisimilitude.
VVHYY. 2022. r/houseplants Grapples with AI Generated Content.” r/SubredditDrama. https://www.reddit.com/r/SubredditDrama/comments/zpg4q8/rhouseplants_grapples_with_ai_generated_content/.

Comments !

links

social