The Nuance of the Gendergap Statistics

I don't often work on quantitative projects, but since publishing Gender Bias in Wikipedia and Brittanica with Lauren Rhue I've come to appreciate just how difficult it can be to communicate findings unambiguously. Of course, had we found that Wikipedia had no biographies of women that would be straightforward enough. However, what we found was a bit more nuanced and I tried to capture that in a single sentence within the abstract:

We conclude that … Wikipedia articles on women are more likely to be missing than articles on men relative to Britannica.

I worked on that sentence for a while, trying to communicate that these findings are with respect to proportions of missing articles and relative to Britannica, but it is easily misunderstood. For instance, Nathan Matias summarized the paper in a recent blog posting as:

Wikipedia covers more women than Brittanica, although the Wiki is more likely to be missing articles on key women.

I think the erroneous fragment "more likely to be missing articles on key women" is a consequence of poor communication on our part.

That sentence in our abstract is trying to communicate the following: Wikipedia's domination of Britannica in biographical coverage is greater for men than for women. We found this in a comparison of percentages and a logistical regression. First, "while Wikipedia had nearly twice the number of female biographies than did Britannica (113 to 60), it had over two and a half times the number of male biographies (673 to 254)" (p. 1145). That is, Wikipedia trounced Britannica with respect to both men and women, but did so more so when it came to male biographies. Second, in a logistical regression, "Male and Unknown, have non-significant coefficients, suggesting that the influence of gender may not be consistent across both reference works. In contrast, the Male in Wikipedia coefficient is significant, providing evidence that gender contributes to the subject’s degree of coverage on Wikipedia" (p. 1147-1148). We then attempted to summarize this in the conclusion as follows:

While Wikipedia has more biographies of women than does Britannica in absolute terms (Table 1), Wikipedia tends to be less balanced in whom it misses than is Britannica as seen in the percentages of missing articles (Table 2) and the positive and significant Male coefficient in the logistic regression (Table 3)

So, two more comprehensible ways we might put this are:

  1. Wikipedia dominates Britannica in biographical coverage, but more so when it comes to men.
  2. Britannica is more balanced in whom it neglects to cover than Wikipedia.

