Tag Archives: corpora

OED Subject Matter

In my last post I described using HathiTrust’s Solr Proxy API to fetch Hathi genre metadata for OED quotations. But genre is not the only metadata that Hathi sends back down the intertubes when I ask it a question. For most works, I also get a Library of Congress Classification code for the volume. This […]

Hathi Genre Again – Zero Recall

In “Hathi’s Automatic Genre Classifier” [17.01.06] I compared the consolidated automatic genre metadata for a subset of HathiTrust Digital Library texts (available here) to the genre classifications arrived at for human-inspected works as part of the OED quotation tagging project under-way at The Life of Words. My process there was pretty closely supervised, but the […]

Hathi’s Automatic Genre Classifier

The HathiTrust Digital Library is a massive collection of digital books: As of 2017, it contains 5 billion pages from 15 million volumes (7 million titles). About 40% of these are public-domain works, meaning anyone can search and read them. Some of these have been marked for their textual genre. Here I do a little […]

Guest Post: Strong and Weak Genre Classification

Over the summer we’re featuring guest posts by Research Assistants at The Life of Words. Here Cosmin Dzsurdzsa – a 2nd year undergraduate in English at UW – thinks about moving from human intuition to computer rule-making in textual-genre classification: When trying to automate text classification algorithmically, one has to pay close attention to how […]

Vector Space and Poetic Logic

I’ve been spending the weekend experimenting with vector space modelling and poetic language. Vector space word embedding models use learning algorithms on very large corpora in order map a unique location in n-dimensional space to each token (=word) in the corpus. “N-dimensional space” is just a mathy-sounding way of saying that multiple (or n) features […]

Method as Tautology

Although it has been available for a while in the advanced access section of Digital Scholarship in the Humanities, and before that Literary and Linguistic Computing, my article on digital methods in literary research has recently been published in its final version. The full bibliographic details are: Williams, David-Antoine. “Method as Tautology in the Digital […]

Little Miss Bossy Pants

In the comments to a Facebook share of my previous post on gendered language on Ratemyprofessors.com [“Vivid Unconscious Biases“], JB, a friend of a friend, writes: “bossy” is an inherently gendered term and is always used as an insult. I can’t remember ever hearing it applied to a man. Indeed, it strikes me that calling […]

Assessing Poetry Assessor

The web application called “Poetry Assessor” has had a second wave of attention since going back online recently. In this post I want to show why Poetry Assessor doesn’t assess poetry, and to make a broader point about the “Humanities” in “Digital Humanities”: that bad disciplinary training makes for bad interdisciplinary work. It’s a longish […]

“Pneumatic Bliss” – Eliot’s Breasty OED Entry

More from the T. S. Eliot / Oxford English Dictionary files [for background, see “Did TSE use OED, SOED, or COD?” and “Eliotic OED“]. In the latter post, I noted that 0.0135% of OED definitions contain the phrase “[with/in] allusion to” and that two of these are to poems by Eliot. Here are lines from […]

The Queen’s English – Respec’

Looking through some graduate work the other day I came across a reference to “the Queen’s English,” in scare quotes, used in the general sense to describe the phenomenon of socially privileged dialect (as opposed to a specific British class dialect). I’ve never heard “the Queen’s English” actually referred to positively or unironically. In my […]