Published: Fri 12 May 2023
By Joseph Reagle
In social .
tags: reddit ethics methods
On TheoryOfReddit
Brian Keegan has posted an open letter regarding Reddit’s tightening of
their API access, especially the cutting off of Pushshift’s access.
Pushshift is/was a third-party repository of Reddit data – used by
researchers and mods – that had difficulty keeping up with deletion
requests, among other things. It was also used by those wanting to find
deleted messages.
This issue – that Pushshift violated Reddit’s user’s privacy
expectations by retaining data, requiring an additional opt-out step,
and then failing to act quickly – is one of the purported reasons for
its removal from the Reddit API.
For the past few years, I’ve been seeking to understand three
questions relevant to this issue, especially for advice subreddits:
How many users actually delete their posts?
How long does it take for them to do so?
Do users actually worry about their deleted messages surviving?
I answer all these questions in a draft (under
review) . For example, Table
1 shows levels of removal and deletion across varied subreddits.
Feedback on the draft is welcome. Of course, without Pushshift I can
no longer extend the data itself.
Table 1 shows that removal and deletion are common, especially on the
advice subreddits. The popular advise subreddits have significantly more
deletions (48%) than other sensitive subreddits (32.4%), which have
significantly more deletions than tech-related subreddits (20.2%).
Moderation has increased over the years, with r/AmItheAsshole going from
14% to 47% to 78%!
Table 1: Percent of submissions deleted and
[removed].
tech subreddits
20.0% [38.1%]
sensitive subreddits
32.4% [16.2%]
Advice
51.6% [09.7%]
53.0% [12.3%]
47.4% [42.8%]
AmItheAsshole
45.8% [13.9%]
48.9% [47.1%]
43.1% [78.4%]
relationship_advice
55.9% [09.5%]
58.9% [09.8%]
53.7% [48.0%]
The popular technology-related subreddits consisted of: Android,
apple, audiophile, buildapc, DataHoarder, electronics, gadgets,
hardware, ipad, linux, mac, sysadmin, techsupport, web, windows. The
sensitive subreddits were those studied by Gaur et al. (2019kaa):
Anxiety, BPD, BipolarReddit, BipolarSOs, StopSelfHarm, SuicideWatch,
addiction, aspergers, autism, bipolar, depression, opiates,
schizophrenia, selfharm; cripplingalcoholism is not included because it
was made private earlier in 2022.
There are comments .