- In your paper, you wrote: “Quotations are slightly altered to reduce
chances of deductive disclosure of individuals who made the posts or
comments.” Is this the case for most of your work using online
sources?
R: Yes, this is what I try to do with most of my work using online
sources, especially because I study substance use behaviors that are
often illegal and stigmatized. I believe I came across this practice on
the recommendation of an editor or reviewer.
- I tried to find the original Reddit message you used in your paper
by way of different search engines and techniques. You can see the
results in the attached spreadsheet, where I found 10-12 of the quotes.
(Two of them might be a stretch.) I was able to make use of
RedditSearch/Pushshift on a bunch of these. Were you previously aware
there is an external index? (I don’t think most people are.)
R: Oh no! I thought I had successfully disguised them, but this is
hard to do without changing the meaning and still feels like an odd
scientific practice. I tried to disguise posts by changing or omitting a
few words here and there until I couldn’t find the quote + reddit in
Google. I do use Pushshift for my other large scale Reddit work (here
and under review). I intentionally used fake subreddit names but that’s
interesting that the year helped. I’ve seen other papers name
drug-related subreddits but I go back and forth all the time about being
fully transparent and reproducible vs. “protecting”(?) the identity of
these online communities.
- How has your practice changed over time? Do you anticipate making
changes to your practice?
R: I don’t think my practice has changed although I might start
naming the large subreddits like r/trees. I might also create composite
quotations from multiple posts. When I review papers that use Reddit
quotations, I’m not sure whether to request this kind of obfuscation or
not. I haven’t heard of any direct harms coming from quotations in
academic papers or naming subreddits, but that doesn’t mean they haven’t
happened or won’t.
I’ve had some ideas and discussions with colleagues about asking
people in online communities directly about their preferences, but these
preferences are probably heterogeneous and dynamic. Furthermore, what do
we do if the response is overwhelmingly “go away, researchers”?
…
To clarify, I think I searched for ” ‘disguised quote’ ‘reddit’ ” in
Google. Specifying the search in site:reddit.com with the year sounds
like another good strategy. This particular quote was hard to disguise
since it uses such specific language, but I thought it was a good
representation of the kind of questions people posed.
Some other thoughts about naming subreddits: I think I might now name
a well-known subreddit like r/trees. A reporter for a cannabis outlet
covered my other Reddit paper in 2018 and asked me to confirm it was
r/trees. I think I was vague in my response, but they still reported it
as r/trees. I also reached out to the r/trees moderators and the one who
responded said it was fine with them to name the subreddit. However, I’d
probably avoid naming some other more sensitive subreddits I’ve come
across, like one for parents who use cannabis and another where people
(and what looks like underage teenagers) post videos of themselves
taking bong and dab hits.
There are comments.