Tag Archives: genre

Guest Post: Don’t go breaking (up) my genre: visualizing genre against attributes

Danielle Griffin is a research assistant on her third co-op term at The Life of Words. This is the first of a few posts based on her last work-term report,”Comparative Data Visualizations of Textual Features in the OED and the Life of Words Genre 3.0 Tagging System”. Danielle’s report won the Quarry Integrated Communication Co-op […]

Guest Post: Cataloguing the Catalogue

Cosmin Dszurdsza is a research assistant at The Life of Words. In my last guest post I discussed problematic magazine classifications. Now, once again, a periodical publication proves to be an exciting and difficult genre identification challenge. The kind of text I will be dealing with today is the “catalogue” (filtered out of our data […]

One last round with metadata from Hathi and Underwood

In “Hathi’s Automatic Genre Classifier” and “Hathi Genre Again – Zero Recall“, I ran a couple of experiments comparing genre categories assigned by human taggers working on the Life of Words OED mark-up project to two sources of genre metadata associated with the HathiTrust Digital Library. The first post looked at data from the automatic […]

OED Subject Matter

In my last post I described using HathiTrust’s Solr Proxy API to fetch Hathi genre metadata for OED quotations. But genre is not the only metadata that Hathi sends back down the intertubes when I ask it a question. For most works, I also get a Library of Congress Classification code for the volume. This […]

Hathi Genre Again – Zero Recall

In “Hathi’s Automatic Genre Classifier” [17.01.06] I compared the consolidated automatic genre metadata for a subset of HathiTrust Digital Library texts (available here) to the genre classifications arrived at for human-inspected works as part of the OED quotation tagging project under-way at The Life of Words. My process there was pretty closely supervised, but the […]

Guest Post: Magazines and the Dentist Test

Cosmin Dzsurdzsa is a research assistant working on identifying the textual genre of quotations in the OED. Here he writes the first in a series of posts on borderline and difficult genre determinations. Filtering quotation blocks is essential to optimizing our results with the quantity of data we deal with here at LOW. For a […]

Hathi’s Automatic Genre Classifier

The HathiTrust Digital Library is a massive collection of digital books: As of 2017, it contains 5 billion pages from 15 million volumes (7 million titles). About 40% of these are public-domain works, meaning anyone can search and read them. Some of these have been marked for their textual genre. Here I do a little […]

How did OED Supplements Supplement?

There has always been an interest in the changing editorial practice within and between various editions of the Oxford English Dictionary. Recently some scholars have complained that changing electronic interfaces are making it impossible to distinguish what edition a particular definition or quotation is coming from. See, e.g., Charlotte Brewer, “OED Online Re-launched: Distinguishing old […]