The Lifespan of Words (three ways)

Getting ready for DH2017 this morning, I found myself curious about the lifespan of English words–when they come into the language and when they fall out. So I got all the earliest and latest attestation dates for all the words in OED3, and plotted them out. Here are three graphs (“visualizations,” if you like), all based on exactly the same data, that show slightly different things about the lifespan of words. Since I’m off to the conference very shortly, I’ll give all three here with minimal comment. For larger PNGs click on the image, or download the linked PDF for a detailed and scalable view.

    1. This graph shows the earliest and latest attestations, stacked from bottom to top in order of earliest to latest. Bearing in mind that about half of 2017 OED3 entries are actually 1989 OED2 entries, which are mostly 1884-1933 OED1 entries, we should consider 1800-1850 to be a fuzzy range for obsolescence today: words last-attested before then are probably now obsolete (and in fact the blurry boundary is fairly visible at around 1850). The slope of the graph represents the rate of new-word attestation: i.e. slow until 1400, then faster, with a bit of a slow down around 1700 (this is probably due to OED’s well-known deficit of 18thC quots, rather than a feature of the language). Thicker horizontal white breaks indicate periods during which new words didn’t last very long.

  1. Graph 2 flips the ordering of the bars, latest-to-earliest as you go up, which has the effect of left-justifying the graph. Here the slope represents the number of words still attested as active in whatever x-axis period you’re looking at (bearing in mind the fuzzy 1800-1850 border for words that could be obsolete today). This gives us a better view of the language at any one time, in terms of the relative age of its vocabulary (such as it has been preserved).
  2. Same data again here, except that it’s ordered according to the duration of the word (with equally long-lasting words ordered by latest attestation). Obviously, it stands to reason that the longest lasting words would be the oldest ones, which is why the bottom of the graph is solid orange: this is sometimes called the “Old English substrate” of the language.
    The sail shape made by the densely packed bars just above that appears to suggest that durable words are produced at a fairly steady rate (the slope is mainly just a function of how many years are left until the present) — this would mean that the changes in slope in the previous graph(s) reflect fluctuations in ephemeral words, which makes sense of the 18th C. gap (OED would be less likely to overlook a durable word than an ephemeral one, obvs.). However, the “shadow” of the sail suggests somewhat more dense periods of production at around 1400 and from c. 1550 to c. 1650. To my eyes that looks like the effect of Chaucer and Shakespeare, though most of the gap between main sail and shadow, especially after 1800, is probably due to the variance by edition (i.e. unrevised OED1 entries will have earlier latest attestations).

Anyhow, I’m off. Tell me what you see.

