No author’s representation in the OED has received more comment than Shakespeare’s: if you ever come across a mention of OED citation evidence, more than likely it’s being used to substantiate (sometimes challenge or qualify) a claim that Shakespeare invented the most English words, or made up the most new meanings for existing words, or knew all the best words, or something of that nature. One rarely hears the name Sir Thomas Urqhart of Cromarty mooted about for his OED stats. In this post I want to look behind some of the regularly cited Shakespeare numbers to give a slightly more detailed and rounded view of his contribution. What I have to say here concerns OED1, the edition published in 1928, but the picture is much the same for OED2 (1989). The current OED3 revision is another sort of beast.
It is undeniably the case that Shakespeare is credited with being the first to put a word on paper a whole lot (this is not the same as inventing a word, though probably he did that too). Although three late-medieval sources beat Shakespeare in this category (Cursor Mundi, Chaucer, and Wyclif), that’s largely because of the relative lack of competing documentation in the period. Perhaps even more impressive is the large number of earliest citations for a subsequent sense, a category in which Shakespeare is first, and by a large margin.
Most figures given for Shakespeare tend to lump all of his quotations together, but splitting out the quotations by work can give us a bit more detail, and also allows us to compare longer works (such as Troilus and Cressida or Hamlet) to shorter ones (such as the Sonnets) on the same basis. So, in first chart, below, I show the rate of first citation (for a word or for a sense) for each of Shakespeare’s cited works (y-axis), relative to the rate of total citation for these works (x-axis). The rate simply divides the quotation count in each category by the word-count for that work.
[You will probably need to enlarge these images, just by clicking on them]
The hollow circles clustered on the lower-left-hand side represent closest comparator works, drawn from the ten most-cited (in raw terms) dramas of the 16th, 17th, and 18th centuries.
From the very marginal horizontal overlap between the two groups, we can see that Shakespeare’s works are being cited at a significantly higher rate overall. This is no surprise. Also unsurprising is that Shakespeare’s works fall mostly higher on the vertical axis, measuring the rate of earliest citation. In fact overall citation and earliest citation are highly correlated (R-squared=.75, as long as if you ignore outliers A Lover’s Complaint and The Tempest). The more a work is cited, the more earliest citations it tends to furnish.
That makes sense. But there’s a bit of information embedded in this chart that is hard to see, and gives a slightly more nuanced view of how each work is being used to document new English words and senses, and that is the proportion of quotations from each work documenting these. I mean, Jonson’s Volpone is about the same length as Shakespeare’s Hamlet. If OED is simply citing Hamlet ten times for every one citation of Volpone, that might go a long way to explaining Hamlet‘s tenfold advantage in first-citations, simply on statistical grounds.
So the next chart shows, on the vertical axis, the percentage of each work’s OED quotations that are earliest quotations (word and sense).
The picture, all of a sudden, is very different. The horizontal distribution has not changed, but the vertical axis now shows the non-Shakespearean comparators with mostly higher first-citation proportions than Shakespeare’s works (the median, in fact, is +5 percentage points higher). This means that, with the exception of Addison’s Cato and the 2nd Part of Return to Parnassus, all the comparators have a higher proportional first citation rate than the majority of the Shakespeare dots, with four (Volpone, Frier Bacon and Frier Bungay, Ralph Roister Doister, Alchemist, and Staple of News) higher than all but two (TLC and L.L.L.).
Even some of Shakespeare’s works start to look a little different. One thing that’s apparent is that whatever correlation was displayed on the first graph, it has totally evaporated: having more OED quotes per word has no effect on what percentage of those quotes end up documenting a first usage. Tempest looks like even more of an outlier, in the same top spot on the x-axis but now even closer to the bottom of the y. And ALC distances itself even further from the pack of Shakespeare quotes.
So was Shakespeare much more lexically inventive than, say, Jonson, Udall, Marston, or Greene? What is the more telling metric–raw counts of first-uses, or percentages of quotes that are first uses?
I think both have their uses, but we can actually break the latter category down even further to get a highly detailed view of how OED lexicographers used the documentary evidence from each work.
To get a more objective view, we need to look at different kinds of earliest citation. The category most people are familiar with is “first in entry”, but there’s a special case that should be considered separately: entries with only one citation. This is significant because single-citation entries quoting literary works tend to be nonce uses, hapax legomena or even transcription errors, which some have speculated are more likely to be recorded simply on the basis of the prestige of the author Shakespeare’s cyme, Eliot’s opharion, e.g.). And we want to separate out earliest-in-entry from earliest-in-subsequent-senses, in order to get a proxy for sense extension, or using old words in new ways.
So there are three mutually exclusive categories to distinguish: (1) only citation in entry, (2) earliest citation in entry, (3) earliest citation in subsequent sense (and an implied fourth: the rest – add them all and subtract from 100%). Below are all the same works, on the same horizontal axis, but split out now on the vertical among these three categories. As it gets a bit messy in there, I’ve added a few vertical trails to link up each category for some of the more interesting works.
There are some telling things worth pointing out in this chart. First is that the green triangles, the earliest-in-subsequent-sense dots, are fairly evenly distributed between 8% and 15%, with not much to separate non-Shakespearean and Shakespearean works.
The real difference starts to show in the blue-diamond only-citation-in-entry dots, where non-Shakespearean works are mostly higher than most of Shakespeare’s works, and is even more evident in the red-square earliest-in-entry dots. In this latter category, the non-Shakespearean works are almost all much higher than the Shakespearean average. To paraphrase the chart: the evidence drawn from these works by Udall, Greene, Marston (Ant. & Mel., anyway), and Udall is proportionally weighted much more towards first-usage documentation for words than most of the evidence from Shakespeare’s works.
Well, what to say about A Lover’s Complaint, then, which outdoes every other work on the vertical axis in every category: cited more for single-citation entries, and for earliest citations in an entry, and a subsequent sense section?
It may just be a statistical outlier: at only 2,560 words and 136 total citations in OED, it is the work with the least actual documentary evidence this chart, making every citation that much more able to skew its positioning on both axes.
Even so, in the category of earliest-in-entry, even in raw counts ALC rivals or beats (in descending order): A Midsummer Night’s Dream, A Winter’s Tale, 3rd part Henry VI, Julius Caesar, and Pericles, all of which have between 3 and 8 times the total number of total citations as ALC.
Another possibility may have to do with the intrinsic linguistic qualities of the text. Though published along with the Sonnets in 1609, ALC has always struck readers as a bit of an anomaly. As Catherine Kraik summarizes, in Shakespeare Quarterly:
The poem has long presented something of an enigma to scholars and has for centuries been overlooked and undervalued. Readers are often frustrated by the poem’s complex syntax, which occasionally verges on impenetrability; more significantly, the poem’s reception history is characterized by uncertainty about its attribution to Shakespeare and doubts over whether its 1609 appearance was authorized. Ward E. Y. Elliott and Robert J. Valenza recently revived the authorship debate, arguing on the basis of computer-assisted textual analysis that both A Lover’s Complaint and A Funeral Elegy “fail too many Shakespeare tests to look much like Shakespeare.” Other scholars have concluded to the contrary that the poem can confidently be attributed to Shakespeare and, further, that it was written at some point between 1602 and 1605.
Whatever the doubts about its authorship, ALC was usually published along with the Sonnets in Collecteds and Variorums, and was included in the Furness concordance of 1879, meaning that OED lexicographers had the same basic tools at their disposal in locating quotations where ALC was concerned. Kraik’s summary suggests something distinctive about the language of the text, however, and other sources describe its diction as uncharacteristically archaistic and Latinate.
But looking at the 50 words for which ALC is cited first, either for a word or a sense, they don’t strike me as particularly archaistic or Latinate at all. I’ve listed these, for the three types of first citation, below.
| ONLY Q
||1st Q in ENTRY||1st Q in SUB. SENSE|
|empatron, v.||betrayed, ppl. a.||beaded, ppl. a.|
|invised, a.||bonded, ppl. a.||caged, ppl. a.|
|phraseless, a.||destined, ppl. a.||confine, sb.2|
|distractedly, adv.||daff, v.2|
|encrimsoned, ppl. a.||dismount, v.|
|enswathe, v.||distance, sb.|
|fluxive, a.||distance, sb.|
|impleach, v.||figure, sb.|
|launder, v.||formal, a.1|
|livery, v.||guard, sb.|
|lovered, ppl. a.||hive, sb.|
|orbed, a.1||little, a.|
|pellet, v.||orb, sb.1|
|posied, a.||pensive, a.|
|reword, v.||ride, v.|
|sistering, ppl. a.||ruffle, sb.2|
|supplicant, sb.||sheaved, ppl. a.|
|unexperient, a.||storm, v.|
|suffering, ppl. a.|
|unapproved, ppl. a.|