Disguising Reddit sources and the efficacy of ethical research

Joseph Reagle



light, moderate and …

false details; no verbatim quotes if a “search mechanism could link those quotes to the person in question… someone deliberately seeking to find a subject’s identity would likely be unable to do so.” (Bruckman (2002))

Does it work?


  • King (1996) faulted FinnLavitt (1994) for disguising sources’ names, but not that of the sexual abuse forum or the date and time of posts.
  • Zimmer (2010) critiqued researchers for creating a “Tastes, Ties, and Time” Facebook dataset that was improperly — perhaps impossibly — “anonymized.”
  • BarbaroZeller (2006) reported on — and confirmed — the potential to locate sources in an AOL dataset.
  • Singer (2015), wanting to speak to a source in a research study, was able to identify, contact, and interview the source.


Research reports

  • Looked for Reddit research from past five years using keywords such as “privacy,” “verbatim,” “fabrication,” and “AoIR guidelines.”
  • Found three reports using verbatim phrases and three using reworded phrases.
  • Attempted to locate Redditors’ phrases in the six reports with Reddit, Google, and RedditSearch/Pushshift.


  • Three researchers consented to speak with me.
  • We discussed their practice, rationale, influences, and thoughts about my efforts.
  • Though I used public research reports and their own Reddit sources in my analysis, none of this is identified or quoted.


Efficacy of disguise

report approach sources located note
V1 verbatim 18 17 - leaked non-throwaway accounts
V2 verbatim 17 15 - didn’t account for deleted posts
V3 verbatim 6 6 - inconsistent description/practice
R1 reworded 2 0 + preferred interviews to posts
R2 reworded 5 5 - posts found via thread title
R3 reworded 8 0 + disguises tested by researchers

Thank you!

Macaca nigra self-portrait (rotated and cropped)