Reddit masking interview with Meredith Meacham

  1. In your paper, you wrote: “Quotations are slightly altered to reduce chances of deductive disclosure of individuals who made the posts or comments.” Is this the case for most of your work using online sources?

R: Yes, this is what I try to do with most of my work using online sources, especially because I study substance use behaviors that are often illegal and stigmatized. I believe I came across this practice on the recommendation of an editor or reviewer.

  1. I tried to find the original Reddit message you used in your paper by way of different search engines and techniques. You can see the results in the attached spreadsheet, where I found 10-12 of the quotes. (Two of them might be a stretch.) I was able to make use of RedditSearch/Pushshift on a bunch of these. Were you previously aware there is an external index? (I don’t think most people are.)

R: Oh no! I thought I had successfully disguised them, but this is hard to do without changing the meaning and still feels like an odd scientific practice. I tried to disguise posts by changing or omitting a few words here and there until I couldn’t find the quote + reddit in Google. I do use Pushshift for my other large scale Reddit work (here and under review). I intentionally used fake subreddit names but that’s interesting that the year helped. I’ve seen other papers name drug-related subreddits but I go back and forth all the time about being fully transparent and reproducible vs. “protecting”(?) the identity of these online communities.

  1. How has your practice changed over time? Do you anticipate making changes to your practice?

R: I don’t think my practice has changed although I might start naming the large subreddits like r/trees. I might also create composite quotations from multiple posts. When I review papers that use Reddit quotations, I’m not sure whether to request this kind of obfuscation or not. I haven’t heard of any direct harms coming from quotations in academic papers or naming subreddits, but that doesn’t mean they haven’t happened or won’t.

I’ve had some ideas and discussions with colleagues about asking people in online communities directly about their preferences, but these preferences are probably heterogeneous and dynamic. Furthermore, what do we do if the response is overwhelmingly “go away, researchers”?

To clarify, I think I searched for ” ‘disguised quote’ ‘reddit’ ” in Google. Specifying the search in site:reddit.com with the year sounds like another good strategy. This particular quote was hard to disguise since it uses such specific language, but I thought it was a good representation of the kind of questions people posed.

Some other thoughts about naming subreddits: I think I might now name a well-known subreddit like r/trees. A reporter for a cannabis outlet covered my other Reddit paper in 2018 and asked me to confirm it was r/trees. I think I was vague in my response, but they still reported it as r/trees. I also reached out to the r/trees moderators and the one who responded said it was fine with them to name the subreddit. However, I’d probably avoid naming some other more sensitive subreddits I’ve come across, like one for parents who use cannabis and another where people (and what looks like underage teenagers) post videos of themselves taking bong and dab hits.

Comments !

links

social