Gender Bias, Part II

In the previous analysis, of the 174 women from the National Women’s History Project, Wikipedia lacked articles on 23 of the women, Britannica missed 65. Hence, I found no support for the idea that gender imbalance in Wikipedians leads to similar imbalance in biographical coverage. However, this did support the (unsurprising) fact that Wikipedia has greater coverage in its number of subjects and article length. Therefore, as noted, on the gender question it would be nice to have a sense of relative proportions.

Consequently, in the second analysis I look at Time‘s “100” most influential people from 2008. (There are more than 100 subjects because there are a few couples that I break out.)

43 entries are missing from EB; 4 from WP. 4 entries are in neither. For articles existing in both, WP articles are 7.66 times larger on average (median of 6.81).

Of the 105 entries: I guess that 23 are female, 82 are male and 0 are unknown. That is, the ratio of females to males is 0.28. Of the Wikipedia articles, females are 0.29 (23/78) of males; and 0.27 (13/49) at Britannica.

That is, while one might claim that this ratio of 0.28 is evidence of a bias – on the part of Time or the world at large – it is a base line from which we can judge the reference works: neither Wikipedia nor Britannica are disproportionately better or worse. If the reference works were biased towards coverage of men, we would expect that ratio to be lower than 0.28 (e.g., if all missed articles were females).

Of course, I’d like to run this over a larger corpus, but in terms of easy to find lists of notable persons, these “100” lists are all I’ve found so far. Also, I’m relying upon heuristics again to guess the gender of subjects, but they seem to be working well. (EB’s Mia-Farrow article is guessed as male because it’s actually a stub/sentence in the Woody Allen article.) Finally, an additional feature my approach has is to augment the table with the content from both reference works, but I expect Britannica would not be happy about that so I don’t provide that version publicly.

Ported/Archived Responses

Danielle Maedchenspiele on 2009-10-14

I can’t believe that Britannica missed 65 articles! However, congrats for the analysis, I found it really interesting because I am studying this topic as well!

Sue Gardner on 2009-10-12

Really interesting, Joseph, thanks. I’ve been looking for this kind of analysis.

Joseph Reagle on 2009-10-13

Thanks Sue. I did find a nice corpus that I’m prepping for analyzing, but don’t expect to find anything contrary to what we see here.

Joseph Reagle on 2009-10-03

Right you are Axel. I mispoke. I use the ratio instead of percent of total in case there are gender “unknowns”. (I could also do percentage of a gender relative to all less the unknowns.)

Axel Boldt on 2009-10-03

Nice analysis, thanks. One tiny point: the female/male ratio being 0.28 doesn’t mean that 28% of subjects are females. 23/105 = 22% of the subjects are females. 0.28 to 1 are the odds for a subject to be female.

Comments !