The recent controversy about gender imbalance and sexism in open content communities has been remarkable this summer, and this week's news about Shuttleworth's comments might mean it will extend into the autumn. While I think these events merit a historical and cultural analysis -- and prompts the questions if sexism increased, is it being noticed more, or both? -- I want to postpone that undertaking for the moment. Instead, I wonder if the recent demographic data that shows women are about 13% of Wikipedians affects its topical coverage?
As you can see in this Comparison of Biographies , Wikipedia does very well in its coverage of National Women's History Project (NWHP) biographies relative to Encyclopaedia Britannica .
64 entries are missing from EB; 23 from WP. 23 entries are in neither. For articles existing in both, WP articles are 6.29 times larger on average (median of 4.00).
That is, of 174 women, Wikipedia is missing articles on 23 of them. That's almost a third of those missing from Britannica , which doesn't have any articles not at Wikipedia. When both do have an article, Wikipedia articles have much more content. Of course, those are just the quantitative numbers. Even so, when I browse the actual articles, I am partial to the extra content and images of Wikipedia.
Yet, a difficulty in this work is finding a useful corpus of biographical persons. To say that there are more articles about men than women in any reference work, isn't saying much given world history. So, for this analysis I use those women recognized by the NWHP for Women's History Month. The NWHP is a nice collection in that it has both well known women and lesser-known women who are thought to be notable nonetheless. However, this only tells us that Wikipedia has greater coverage of women than a traditional encyclopedia. (And while this is one of the first large and topical -- rather than quality -- comparisons it should not be all that surprising given Wikipedia's size.) And, Wikipedians are aware of their own systemic bias and make attempts to counter it. For example, those recognized by Black History Month were the focus of a WikiProject that documented every person recognized. (Ironically, this list was taken from Britannica. And perhaps the NWHP list will prompt a similar project at Wikipedia, which is why I use permanent links to the specific versions I analyzed.)
What would really be nice is a source corpus of notable persons, both male and female. I could then compare this against Wikipedia and Britannica to see how they fare relative to the source corpus. That is, a source corpus of 100 people might recognize 75 men and 25 women (25% female), and if one of the references had a 60/15 split, it'd be less "feminine" (20%) than the source. How, then, does each reference work compare to each other, relative to their source? If you have a suggestion for corpus, please leave a comment.
Finally, while speaking with Nora about this, she also raised the question of if the gender ratio of disruptive editors differs from that of the larger community? Our hypothesis is disruptive editors might be disproportionately male. But who can say? Unfortunately, I expect it's difficult to get survey responses from such editors.