Disguising sources and spinning phrases

Joseph Reagle

2021-05-07

Praxis

Sources as… sources

Advice forums

Should I disguise online sources by eliding usernames and altering quotations?

Advice about health and relationships are sensitive topics, even if shared in public via pseudonyms.

Literature

What’s the word?

Disguise (Bruckman (2002))

light
identify forum, change usernames; “verbatim quotes may be used, even if they could be used to identify an individual… an outsider could probably figure out who is who with a little investigation.”
heavy
false details; no verbatim quotes if a “search mechanism could link those quotes to the person in question… someone deliberately seeking to find a subject’s identity would likely be unable to do so.”
moderate
“a compromise position … as appropriate”

Following…

  • King (1996) faulted FinnLavitt (1994) for disguising sources’ names, but not that of the sexual abuse forum or the date and time of posts.
  • Zimmer (2010) critiqued researchers for creating a “Tastes, Ties, and Time” Facebook dataset that was improperly — perhaps impossibly — “anonymized.”
  • BarbaroZeller (2006) reported on — and confirmed — the potential to locate sources in an AOL dataset.
  • Singer (2015), wanting to speak to a source in a research study, was able to identify, contact, and interview the source.

Method

Research reports

  • Looked for Reddit research from past five years using keywords such as “privacy,” “verbatim,” “fabrication,” and “AoIR guidelines.”
  • Found three reports using verbatim phrases and three using reworded phrases.
  • Attempted to locate Redditors’ phrases in the six reports.

Searching Reddit

Reddit
provides searching of all posts, but not comments, via the website’s search bar and the Application Programming Interface (API).
Google
indexes all of Reddit, which is especially useful for finding comments. Google searches can be narrowed by way of the time and site fields.
Pushshift.io (RedditSearch UI)
is a third-party index of Reddit. It indexes posts and comments and provides many search fields via its API, including time ranges

Interviews

  • Three researchers consented to speak with me.
  • We discussed their practice, rationale, influences, and thoughts about my efforts.

Ethics

Though I used public research reports and their own Reddit sources in my analysis, none of this is identified or quoted.

Analysis

Search tools

  • Reddit is excellent at finding verbatim content from a post but does not support searching for the comments that follow a post.
  • Google can search posts and comments and was useful for non-exact (non-quoted) searches.
  • RedditSearch, a human-friendly interface for Pushshift’s index and API, is the most potent tool, enabling sophisticated searches and maintaining copies of messages that have since been edited or deleted.

Efficacy of disguise

report approach sources located note
V1 verbatim 18 17 - leaked non-throwaway accounts
V2 verbatim 17 15 - didn’t account for deleted posts
V3 verbatim 6 6 - inconsistent description/practice
R1 reworded 2 0 + preferred interviews to posts
R2 reworded 5 5 - posts found via thread title
R3 reworded 8 0 + disguises tested by researchers

Improving disguise

Could we automate this?

Spin Rewriter

WordAi

… your turn

https://reagle.org/disguise

https://reagle.org/disguise

https://reagle.org/disguise

Thank you!

Macaca nigra self-portrait (rotated and cropped)