Now published in Review of English Studies (Advance Access), an article by me on the ways in which the Oxford English Dictionary has treated texts authored by women in its marshalling of citation evidence for English language lexis, from the first edition (1884-1928) to the current OED3 revision (2000-). The approach I take is driven by quantitative analyses, and so I also do some work here with the Library of Congress catalog and the HathiTrust Digital Library.

Accompanying the article proper is a Supplementary Data file incorporating notes on data curation and analysis policies, and a number of aggregated datasets. I’ll do my best to answer any queries regarding these.

The article and notes file can be viewed freely on the RES website (recommend .PDF), and from the links below:

A few highlights from the visual aids sections of the piece:

  • Here’s a figure that shows that analyzing small subsets of quotations, like those from early English novels, can lead to distorted results, because when dealing with small samples OED data can be sensitive to discrete publication events. The graphs compare OED to ENBS (the Garside et al survey of English novels 1770-1836).

  • Here’s a figure that shows just how low citation rates are and have been for books written by women (as a % of citations of all books). It compares quotation rates in various OED editions, and authorship rates (by volume) in the Library of Congress Catalog and the Hathi Digital Library Catalog.


  • Here’s a figure that compares the above OED citation rates to the Hathi above rates, expressed as a proportional excess or deficit. In other words, for a given 25-year period, is OED over- or under- quoting women-authored works? One of these graphs is not like the others.


  • Here’s a figure that breaks down some OED3 (2000-) citation rates according to the Subject category the sense definition is assigned. It shows that any way you slice it, Subject categories tend to exaggerate underlying male bias.

  • Here’s a figure that shows that even if gender is differently distributed across subject areas in the available texts (Library of Congress is a proxy here for “what’s out there”, LC classes for “subject areas”), OED is — as of now — under-citing women authors in every LC class, and in a big way in two of the most common (i.e. where the evidence is plentiful), namely Literature (PN-PZ) and Sciences (Q, R, S, T).

Have a look at the article for further commentary. Happy to talk more with anyone who might be interested, in the comments below or off-line as well.

