Ever since OED first began publishing entries in 1884, word geeks have been trying to one-up the big dictionary by finding older instances of words than the oldest it cites (such discoveries used to fill up the pages Notes and Queries). From time to time in supplements and additions OED would update an entry with an older citation, but with the revision project that is OED Online (OED3), it got into the antedating business big-time, with every entry subject to review. OED’s antedatings even get written up, which such arresting headlines as:
A splendid antedating of white lie
An antedating for gay, and other treasures from the Burgess Papers
A huge find for the OED – a startling antedating for partner meaning ‘spouse’
Revised entries began being updated in OED Online in 2000, and have been released in quarterly batches ever since. Recently I’ve been working on antedatings, so I thought to look into how the rates of antedating have changed since the project started over 20 years ago.
Here’s a graph showing the total rate of antedating (percent of entries revised with older first citation dates than the corresponding entry in OED2) and the rate of very big antedatings (50 years or more).
Beyond the clear improvement in both metrics over time, the graph shows an acceleration, especially for all antedatings, between 2006 and 2012, bordered on both sides by relatively flat curves. In those 6 or so years, the project went from a 40% to a 60% antedating rate, a remarkable improvement.
One can guess at some of the reasons for this: no doubt a large part is due to the availability of quality searchable historical databases, such as EEBO and ECCO, and newspaper collections such as Australia’s Trove. Google Books also comes online around this time (Google’s university partnerships were announced in late 2004; in 2010 it passed the 10 million books mark). I’m looking into some other less obvious hypotheses, but I’d guess the availability of both digitized document repositories and good searching algorithms geared towards historical texts (e.g. multiple word-spellings and word forms) play the biggest role. [More advanced corpus techniques are discussed here].
You want to know what the biggest antedating has been so far, don’t you? Well, I’ll tell you: it’s the not-very-interesting out-half, n., for which OED2 recorded the modern meaning in Rugby football (a position on the field that I’d call the fly-half) recorded in 1949. OED3 found an Old English use 936 years earlier, meaning “The outside, the exterior” and included it in the same entry.
Most of the very big antedatings follow this pattern: a modern term that is probably independently invented (formed, compounded, re-borrowed, etc.) but which has an identical predecessor that escaped the editors of the Second Supplement: throughway, outshove, and wending are like this.
But tit, antedated by a stonking 903 years, is another story! OED2 didn’t record any uses before 1928, meaning “a woman’s breasts” etc., only saying that as a variant of more general teat it was obsolete and dialectical. OED3 now has documented the teat sense, continuously from Old English (in the Lindisfarne Gospels – Luke 9:27 if you want to know) to 1992. But it also has managed to antedate the “woman’s breasts” sense to Old English, adding two 13-14thC quots and a 1862 quot. Now that’s what I’d call splendid:
I like to say we’re in a golden age of antedating, and now you’ve delimited the start of the golden age! One of the great achievements of the age is the triumphant antedating of “the whole nine yards” to approximately 1907 (summarized by Dave Wilton at Wordorigins), once thought to be an impossible mystery. As recently as 2005, Arnold Zwicky wrote that it was “almost surely false” that the phrase was in common use by the 1950s/early 60s. He drastically underestimated how much pre-1950 writing existed that was yet to be searched.
Another example of premature despair: in 1986, Yvonne Warburton wrote about the then-unsuccessful search for documentation of “the thin red line” from the Crimean War, with the doleful conclusion “These are the kind of questions which OED detective work can unfortunately never answer.” Her article was reposted on oed.com, and boom! somebody got a cite from January 1855 in The Times, and it went into the March 2007 OED update. John Simpson tells the full story in The Word Detective.
[…] is also an issue: current practice, which takes advantage of large digital text databases, has achieved an antedating rate of about 60% over earlier editions, showing just how provisional a date of first attestation can be. But […]
The step change would be even sharper if you had access to the originally published versions, since some of the older revised entries have since gotten antedates that they didn’t have on first publication. There’s a very interesting blog post from 2014 on the revision of un-: very often, the earliest evidence found in 2014 for an un- word was earlier than the earliest evidence they’d found ca. 2000 for the corresponding word without un-, so they had to re-check and update a lot of the “positive” words. For example, they found unmacadamized earlier than the first citation in the entry for macadamized published in 2000, so they did a fresh search and found an even earlier macadamized — and that in turn prompted a fresh search for the verb macadamize and an earlier example for that, too. And work on unmated triggered a revision and re-ordering of senses of mate, v.
Yes I think by 2014 it was pretty clear that you had a pretty good chance of antedating all but the most recently revised entries in the improved and expanding databases. This is still the case, as LOWbot demonstrated recently (focusing on M, of course, since it was revised first – hence macadamized, mated, etc.).
But what do you mean by “the originally published versions”? My comparison set is the 1989 OED2.
Sorry, I meant the first OED3 versions: e.g. if you’d had access to the entry for macadamized, adj. as published in March 2000 (I assume you didn’t?), it would *not* have antedated the OED2, but in the versions posted since December 2014, it does. I suspect that mailable is a similar case. So the rate of antedating in 2000-2006 was actually slightly lower than the chart shows.
Right, yes, I’m sure that’s correct. The earliest I have is a 2015 copy. Although there is probably a way to root out later quots added to entries revised earlier, it wouldn’t be straightforward. They keep them over at OED though, or so I’m told.