OED3 Revision, Revised for 2020

In my previous post [OED3’s Revision Status (c. 2018.12.15)] I took a bird’s eye view of when various parts of the Oxford English Dictionary Online (OED3) were added, and when revised (if they’ve been revised). I came up with a figure that at the end of last year (2018), 50.4% of entries in OED3 were either newly added or revised since 2000, when OED3 first launched. This represented 67% of the dictionary text, mainly because revised entries tend to be twice as long as the entries they replace.

Well in the last year, the proportion of new and revised entries has gone up to 52%, as of the latest (December 2019) update! That’s progress! To put it in a historical perspective, here’s what the dictionary’s revision progress has looked like at the start of each year for the past 20 years:

So when will OED3 be a completely up-to-date record of the English language?

Well, never (sorry), since it will always lag at some distance behind the times.

But you might say that the phase of OED3 as mainly a Great Revision project will be complete when there are no more pre-2000 entries to revise, and all new content is “new”, rather than remediated, even if it is being added to existing entries to reflect sense development etc. So with any luck OED3 will not end up as one of those famously hard-to-finish dictionary projects (even though OED got going 135 years ago), which have recently been in the news.

Compared to many of those projects, OED works at a clip. [It is not a fair comparison–basically nothing is comparable among those various endeavors, all of them very much worth the long while.] But it works at a clip. Since 2000 OED3 has revised an average of 6,300 old entries each year. Pause on this for a second: that means that OED has been producing 17 revised entries per day since 2000! I do not know how to calculate this exactly, but I’m guessing it’s amortized over a very large number of lexicographer-hours. (I mean, how many OED entries do you think you could revise in one day?)

Truth be told, the actual number of entries revised per year has fluctuated considerably, and in the last five years or so the numbers have been somewhere between 5,400 and 3,300, so below the average. This could be for any number of reasons. [There are also some data subtleties I’m glossing over in this analysis: for one thing, it is not uncommon for a single unrevised entry to turn into two or more revised entries].

If the overall rate of revision continues, however, we could see completion of this phase of the project well within [see update in comments — read “in about”] 20 years:

[*see updated chart in comments]

The dark blue line here represents the actual increase in % of new and revised entries, the dotted line a projection based on the average over the span (+2.6%/y). According to this rough measure, this phase of the project could wrap up a couple of years before 2040 (add two or three more years if you ignore new entries in the calc).

Everyone has their own reasons for wanting more updated entries in OED3. I myself am looking forward (as I’ve said elsewhere) to updated DINNER (OED1: ‘The chief meal of the day, eaten originally, and still by the majority of people, about the middle of the day (cf. German Mittagsessen), but now, by the professional and fashionable classes, usually in the evening;’) and ARBITRARY (no Saussure!), among others.

But beyond that, as you, dear Reader, have already surmised from my incessant counting, summing, and graphing, I would very much like to use OED to tell me accurate and even measurable things about the English language today (or some recent today), at scale.

But for any data application pertaining to that very worthy object of study (as opposed to, say, the OED as a document, which is also very worthy), getting more revised entries is of paramount importance. Right now the OED is not fully useable as a dataset for the historical language, because still 48% of its entries were compiled before 1933, and haven’t been updated since 2000.

2 Comments

  • Emmanuel Pantos wrote:

    Wonderful article but I have some reservations on the extrapolation of the graph. Not consistent with 100% completion by 2040. Unless there is a climate change tipping point by then and even words are evaporated out of the English universe, which I very much doubt.

    I would love to have the data points for this graph.
    I am lectophilic (lectophile?), too (words not in the OED yet though electrophilic, necrophilic, aerophilic, cytophilic, geophilic. and -phile ones are.
    Neither quite a few other words of Greek origin that are found in google searches and academic articles.
    They will keep you busy for centuries, I am sure.

    Marvelous piece of work, congratulations to the team.

    Dr Manolis Pantos.

  • Manolis Pantos’s comment caused me to go back and have a look at my figures. I’ve now recalculated in what I think is a better way, plus added variables for the average rate of revision, comparing the entire 20y average to the last 10y average and the last 5y average (the trend is, um, worrying?). So here is the update:

    Now might also be a good time to reiterate that I am not affiliated with Oxford University, OUP, or the OED.

Leave a Reply

Your email is never shared.Required fields are marked *