One, many, and millions

Joseph Reagle

One, many, and millions:

Will QS and big-N solve science’s reproducibility crisis?

part of work on life hacking

by Joseph Reagle, Northeastern

Saltelli, Ravetz, and Funtowicz (2016)

When students conceive of a scientific exercise as a ‘hack’ rather than a ‘proof’, a new consciousness is being created.

Could citizen science and scientist-citizens together perform the rescue of quality and trust in science? It is much too early to say… (SaltelliRavetzFuntowicz 2016)

Possible solutions?

Quantified Self / single-N (N=1)
big-N (N=1*10⁶)

… but what, exactly, is the crisis?

N=42

Cuddy at TED

“A free no-tech life hack”

change your posture for two minutes… it could significantly change the way your life unfolds (Cuddy 2012).

Carney says “not real”

I do not think the effect is real (Carney 2016).

e.g. of “crisis”

method: hormonal tests were taken from risk-taking subjects who were told they won an extra prize of $2; the testosterone effect “may merely be a winning effect, not an expansive posture effect.”
analysis: data dredging: ignored data, selectively removed outliers, and reported only the statistical tests that showed significance. (Carney 2016)

“Secondary to the key effect”

Cuddy conceded: the most rigorous attempt at replication—a registered study—failed to find the behavioral and physiological changes…

… but it confirmed subjects’ reports of feeling more powerful; the behavioral and physiological changes “are secondary to the key effect” and the subject of continuing research (Cuddy 2016).

but …

Peoples’ self-reports are notoriously unreliable: subjects often report what they want to believe.

Had the change in feeling been the only finding, it would not have merited a TED talk, life hack, or new self-help regime.

Solutions to crisis?

N=1 (QS/single-N)

N=1*10⁶ (big-N)

N=1

Seth Roberts (1953-2014)

suspicious of academic, health, and science gatekeepers
QS experimentalist focused on discovery
author of Shangri La Diet
“Butter makes me smarter”

I cannot remember ever hearing a study proposed that I thought was too small; and I have heard dozens of proposed studies that I thought were too large. (Roberts 2005)

Many people have complained about a lack of replicability problem in psychology… An obvious solution is to raise the bar for publication: require better (= stronger) evidence. Sure, this will improve the quality of testing, but how will it affect the rate of production of plausible new ideas? (Roberts 2014)

He also discovered a different posture hack

Standing and sleep

In 1996, I accidentally discovered that if I stood a lot I slept better… to get any improvement I had to stand at least 8 hours. That wasn’t easy, and after about 9 hours of standing my feet would start to hurt. I stopped standing that much. It was fascinating but not practical. (Roberts 2011)

One-legged standing

In 2008, I accidentally discovered that one-legged standing could produce the same effect. If I stood on one leg “to exhaustion”… At first I stood with my leg straight but after a while my legs got so strong it took too long. When I started standing on one bent leg, I could get exhausted in a reasonable length of time (say, 8 minutes), even after many days of doing it. (Roberts 2011)

New ideas about cause-effect

12 years of self-experimentation led to the discovery of several surprising cause-effect relationships and suggested a new theory of weight control, an unusually high rate of new ideas. – “Self-Experimentation as a Source of New Ideas: Ten Examples about Sleep, Mood, Health, and Weight” (Roberts 2004)

I believe his claim of “cause-effect”

is too strong

“Cannot be placebo”?

I discovered the effect by accident. One morning I woke up feeling much more rested than usual and wondered why. The one-legged standing I’d done before wasn’t even on my initial list of possible reasons. Because the effect surprised me, it cannot be a placebo effect. (Roberts 2011)

… but it can be placebo

data dredging: notice arbitrary blip in data, even if “accidental”
then continue to confirm under placebo bias

single-N

is not the answer to replicability crisis

N=1*10⁶

big-N

aggregations of N=1

“QS data commons”

big-N limitations

easy to find statistical significance in large datasets (with tiny effect sizes)
hypothesis formation and testing should still be distinct
selection bias problematic, especially among QS & gadget enthusiasts

e.g., NIH’s “Findable, Accessible, Interoperable and Reusable” pilot

big-N benefits

if open
- massive data sets for hypothesis generation, correlation, and maybe even natural or designed experiments
- distributed analysis and reproduction
if citizen science
- greater accountability and trust

–

but I agree still “too early to tell” (SaltelliRavetzFuntowicz 2016, pp. 25-26)

Conclusion

“Solving” the reproducibility crisis

Any solution is independent of N:

N=1: little evidence so far
N=1*10⁶: perhaps, if done well

We should

support (single-N) discovery—Roberts’ concern
support (larger-N) confirmation—institutional concern
not abandon rigor: counter data-dredging, p-hacking, the desk-drawer effect, etc.

One, many, and millions