Reddit, Pushshift, and deletion

On TheoryOfReddit Brian Keegan has posted an open letter regarding Reddit’s tightening of their API access, especially the cutting off of Pushshift’s access.

Pushshift is/was a third-party repository of Reddit data – used by researchers and mods – that had difficulty keeping up with deletion requests, among other things. It was also used by those wanting to find deleted messages.

This issue – that Pushshift violated Reddit’s user’s privacy expectations by retaining data, requiring an additional opt-out step, and then failing to act quickly – is one of the purported reasons for its removal from the Reddit API.

For the past few years, I’ve been seeking to understand three questions relevant to this issue, especially for advice subreddits:

  1. How many users actually delete their posts?
  2. How long does it take for them to do so?
  3. Do users actually worry about their deleted messages surviving?

I answer all these questions in a draft (under review). For example, Table 1 shows levels of removal and deletion across varied subreddits.

Feedback on the draft is welcome. Of course, without Pushshift I can no longer extend the data itself.

Table 1 shows that removal and deletion are common, especially on the advice subreddits. The popular advise subreddits have significantly more deletions (48%) than other sensitive subreddits (32.4%), which have significantly more deletions than tech-related subreddits (20.2%). Moderation has increased over the years, with r/AmItheAsshole going from 14% to 47% to 78%!

Table 1: Percent of submissions deleted and [removed].
subreddit 2018-Mar+ 2020-Mar+ 2022-Mar+
tech subreddits 20.0% [38.1%]
sensitive subreddits 32.4% [16.2%]
Advice 51.6% [09.7%] 53.0% [12.3%] 47.4% [42.8%]
AmItheAsshole 45.8% [13.9%] 48.9% [47.1%] 43.1% [78.4%]
relationship_advice 55.9% [09.5%] 58.9% [09.8%] 53.7% [48.0%]

The popular technology-related subreddits consisted of: Android, apple, audiophile, buildapc, DataHoarder, electronics, gadgets, hardware, ipad, linux, mac, sysadmin, techsupport, web, windows. The sensitive subreddits were those studied by Gaur et al. (2019kaa): Anxiety, BPD, BipolarReddit, BipolarSOs, StopSelfHarm, SuicideWatch, addiction, aspergers, autism, bipolar, depression, opiates, schizophrenia, selfharm; cripplingalcoholism is not included because it was made private earlier in 2022.

