Tag Archives: genre

One last round with metadata from Hathi and Underwood

In “Hathi’s Automatic Genre Classifier” and “Hathi Genre Again – Zero Recall“, I ran a couple of experiments comparing genre categories assigned by human taggers working on the Life of Words OED mark-up project to two sources of genre metadata associated with the HathiTrust Digital Library. The first post looked at data from the automatic […]

OED Subject Matter

In my last post I described using HathiTrust’s Solr Proxy API to fetch Hathi genre metadata for OED quotations. But genre is not the only metadata that Hathi sends back down the intertubes when I ask it a question. For most works, I also get a Library of Congress Classification code for the volume. This […]

Hathi Genre Again – Zero Recall

In “Hathi’s Automatic Genre Classifier” [17.01.06] I compared the consolidated automatic genre metadata for a subset of HathiTrust Digital Library texts (available here) to the genre classifications arrived at for human-inspected works as part of the OED quotation tagging project under-way at The Life of Words. My process there was pretty closely supervised, but the […]

Guest Post: Magazines and the Dentist Test

Cosmin Dzsurdzsa is a research assistant working on identifying the textual genre of quotations in the OED. Here he writes the first in a series of posts on borderline and difficult genre determinations. Filtering quotation blocks is essential to optimizing our results with the quantity of data we deal with here at LOW. For a […]

Hathi’s Automatic Genre Classifier

The HathiTrust Digital Library is a massive collection of digital books: As of 2017, it contains 5 billion pages from 15 million volumes (7 million titles). About 40% of these are public-domain works, meaning anyone can search and read them. Some of these have been marked for their textual genre. Here I do a little […]

How did OED Supplements Supplement?

There has always been an interest in the changing editorial practice within and between various editions of the Oxford English Dictionary. Recently some scholars have complained that changing electronic interfaces are making it impossible to distinguish what edition a particular definition or quotation is coming from. See, e.g., Charlotte Brewer, “OED Online Re-launched: Distinguishing old […]