Like the startling clap that follows distant rumbles of thunder,
artificial intelligence (AI) has arrived. Stunning images and precocious
prose can be generated at the behest of anyone. You need only download
Diffusion
Bee or create a free account at chat.openai.com to toy with
these marvels yourself.
And anyone who does so wouldn’t be surprised by the many headlines
warning that “AI’s threat to jobs
and human happiness is real” [Strickland (2022)].
The thunderclap can be so startling, that in July 22, a Google engineer
thought that machines had finally become sentient — and he was fired for
publicizing his belief [Lemoine (2022)]. At the outset of 2023, anyone
can experiment and draw their own conclusions.
While this technology might, some day soon, be disruptive to artists,
journalists, authors, teachers, copywriters, photographers,
illustrators, and others, it is already a problem for some: online
moderators. At Reddit, Wikipedia, and Stack Overflow, contributors pride
themselves on producing content that is, in their lights, correct. These
are epistemic communities, whose members share an understanding
of what constitutes quality contribution, how it is made, and how to
reward its authors [Tzouris (2002), p. 21].
Reddit and Stack Overflow, especially, gamify the creation of quality
content via “karma” and “reputation.” Such voting is intrinsic to the
platforms’ curation, and members share more substantive and personal
recognition by way of flair and awards. On r/AskHistorians users upvote
questions and answers; additionally, great questions are labeled as
such; flairs are given by moderators to design areas of expertise based
on previous contributions; and fellow users can give awards including
“helpful,” “awesome answer,” and “wholesome.” Questions and answers are
similarly voted on at Stack Overflow; open questions can have bounties,
and flairs include gold, silver, and bronze badges. At Wikipedia, voting
and ranking is less explicit; nonetheless, some dedicated Wikipedians
keep a public tally of their edit count, the quality assessments of
their articles, and the awards given to them by their peers.
The challenge for epistemic communities in the face of AI is that of
verisimilitude. The latest bots can produce content that looks
good but substantively fails — even if the substance is spot on most of
the time. When I asked ChatGPT about this concept, it responded that
“verisimilitude does not necessarily have to be based on actual truth or
reality. Rather, it refers to the appearance of being true or real, or
the extent to which a work of fiction seems believable or convincing”
[ChatGPT (2022)]. This
aligns with other definitions, including Wiktionary’s user-edited
definition: “1. The property of seeming true, of resembling reality;
resemblance to reality, realism. 2. A statement which merely appears to
be true” [Wiktionary (2022)].
How do I know, however, that someone didn’t use ChatGPT to create the
Wiktionary definition? In this case, both services’ definitions align
with more authoritative references, but this is the challenge for
epistemic communities. When AI can produce verisimilitude — in response
to requests for advice, to questions about history, or queries about
dunder methods in Python — what ought these communities do?
Traditionally, there’s been a rough correlation between the quality of
content and its polish. Poor quality content can now, nonetheless,
evince a brilliant sheen.
Unlike the artists who object to their work being used to train AIs,
or the illustrators who fear that their jobs are threatened, or the
teachers who worry cheating will be harder to detect, moderators are
complaining that their communities are seeing polished content that
appears accurate but is not. Two years ago, u/pianobutter on
r/TheoryOfReddit anticipated “The New Generation of Spam Bots are
Coming.” Their thread, on a subreddit dedicated to musing about Reddit
itself, began: “Reddit is about to become a battleground. A test site
for a new age of social media. Perhaps even civilization. Things are
going to get weird.” They believed that “Transformer models +
Reinforcement Learning” would replace “human astroturfers and trolls.”
(ChatGPT uses these techniques; it is a Generative Pre-trained
Transformer chatbot.) Most frighteningly, “Downvoting them won’t help:
by downvoting them you are just training them, making them better.
Ignoring them is no good either. They will participate and their
influence will inevitably increase.” Reddit would be the front-line, and
the bots would usher in a “a new era of propaganda” [pianobutter (2020)].
Not surprisingly, r/CryptoCurrency was one of the first to be hit
with GPT-style bots. The bots were using GPT-NEO, an inexpensive yet
“extremely powerful neutral langue network,” to farm karma [i_have_chosen_a_name (2021)]. One can imagine a network of
high-karma bots running pump-and-dump cryptocurrency schemes. More
recently, a concerned Redditor posted to r/ModSupport that
r/nostupidquestions is seeing an “increasing large number of GPT style
answer bots which provide nonsensical but reasonable sounding responses”
[Petwins (2022)]. The
karma these bots accumulate could be used to boost propaganda, for
example. In December, another Redditor noted the problem had spread to
other subreddits, including r/AskReddit, and the mods had banned over a
thousand GPT bots in that week alone; the moderators were struggling.
Even r/houseplants discovered that a well-regarded embroidery of a leaf
posted to their sub turned out to be an AI creation [VVHYY (2022)].
Over at StackOverflow, where you can find questions and answers about
artificial intelligence algorithms, ChatGPT answers were banned soon
after the tool’s release: though “the answers which ChatGPT produces
have a high rate of being incorrect, they typically look like they might
be good and the answers are very easy to produce” [Makyen (2022)]. A ban,
of course, is only observed by honest contributors. Dishonest
contributors who are seeking to “reputation farm” need to somehow be
moderated — perhaps by limiting an account’s ability to answer a given
number of questions in a day.
Even so, you need merely ask ChatGPT to “write me a Wikipedia article
with five cited sources” and it appears to do so — even if some of the
sources don’t, in fact, exist.
Because
Wikipedia lacks explicit voting, karma farming is not so rife. When
Wikipedian Ian Watt shared his resulting example, another longtime
contributor, Andrew Lih, reviewed it relative to the distinctions
between data, information, knowledge, and wisdom: “Vanilla GPT produces
plausible data, prose that esoterically resembles information, passable
but inconsistent knowledge for certain verticals, and most definitely
not wisdom. The worry comes when the bad and good are commingled and
indistinguishable from one another.” Amusing, he also noted that when he
asked ChatGPT how to upload media to Wikipedia, “it’s answer was clearer
than most of our on-wiki documentation. I’m not sure if that’s a
compliment to the AI, or an indictment of our documentation” [Lih (2022)]. Whether uploading AI-generated
images is acceptable has been a topic of discussion at Wikimedia Commons
for the past two years [Owlsmcgee (2019); RAN (2021)]. The most recent discussion was
accompanied by an image whose caption spoke to one opinion on the
copyright of the resulting image: “‘An astronaut riding a horse, in the
style of Monet’. Monet did not paint this image, and even if he were
alive today, he is not the copyright holder of this work simply because
of the brushstroke patterns” [Arkesteijn (2022)].
There are many other issues implicated as well.
As the many headlines indicate, the widespread availability of stable
diffusion and transformer AI has far-reaching implications within the
near future. But people at Reddit, Stack Overflow, and Wikipedia are
grappling with those implications today. And many of us will soon be
grappling with the meaning of verisimilitude in the digital age as it is
used to infiltrate the epistemic communities we rely upon in a world
already struggling with misinformation.
(Thanks to Sarah Ann Gilbert
for discussing this with me.)
References
Arkesteijn, Jan. 2022.
“Commons:Village pump/Archive/2022/10.” Wikimedia Commons,
October 21.
https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2022/10#AI-generated_works.
ChatGPT. 2022.
“A Chat about Verisimilitude.” OpenAI,
December 15.
https://chat.openai.com/chat.
i_have_chosen_a_name. 2021.
“/r/cryptocurrency is being run over by GPT-NEO
bots. Every single topic you make, not matter what it is about will
instantly have 5 -10 comments made by bots. A good 40% of new comments
made here are made by bots.” r/CryptoCurrency, August 21.
https://www.reddit.com/r/CryptoCurrency/comments/p8m0ik/rcryptocurrency_is_being_run_over_by_gptneo_bots/.
Lemoine, Blake. 2022.
“Is LaMDA Sentient? — an Interview by Blake
Lemoine.” Medium, June 11.
https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917.
Lih, Andrew. 2022.
“now that said….” Mastodon, December 7.
https://wikis.world/@fuzheado/109467318402404985.
Makyen. 2022.
“Temporary Policy: ChatGPT Is Banned.” Meta
Stack Overflow, December 8.
https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned.
Petwins. 2022.
“We are noticing a sharp influx of bots in
nostupidquestions.” r/ModSupport,
October 2.
https://www.reddit.com/r/ModSupport/comments/xtmzx0/we_are_noticing_a_sharp_influx_of_bots_in/.
pianobutter. 2020.
“The new generation
of Spam Bots are coming: Where do we go from here?” r/TheoryOfReddit, August 11.
https://www.reddit.com/r/TheoryOfReddit/comments/i7yd7m/the_new_generation_of_spam_bots_are_coming_where/.
Strickland, Eliza. 2022.
“AI’s Threats to Jobs and Human Happiness
Are Real.” IEEE Spectrum, May 12.
https://spectrum.ieee.org/kai-fu-lee-ai-jobs.
Tzouris, Menelaos. 2002.
“Software Freedom, Open Software and the
Participant’s Motivation - a Multidisciplinary Study.” In
M.Sc. Thesis. London School of Economics and Political Science.
http://opensource.mit.edu/papers/tzouris.pdf.
Wiktionary. 2022.
“Verisimilitude.” December 10.
https://en.wiktionary.org/wiki/verisimilitude.
There are comments.