OED Work on “Writing and Editing” Podcast

I talked with Wayne Jones the other day about my work on the Oxford English Dictionary. The result was this short piece on my plans for an updated OED bibliography and Variorum:

182. Enhancing the Oxford English Dictionary


  • kts wrote:

    Thanks for the update! Do I understand correctly that what you got on the thumb drive was the source files from which the OED2 was created, with all the markup tags? The pages that it now shows at the “Previous version: OED2 (1989)” links were presumably created from those files, translated into current HTML. I have been wondering about those pages, since they are visibly different from the printed OED2 in the treatment of apostrophes: in the web version, apostrophes are represented by straight marks ('), while closing single quote marks are curly (’ aka high-9). For example, note the difference between the apostrophe in don't and the quotation marks in this quotation under argumentatively:

    ‘I don't call it honouring the Sabbath to sit down to a worse dinner than on a work-a-day,’ Jim remarked argumentatively.

    This difference is not visible in the printed OED2, which represented both by curly marks. But since it’s in the web pages, presumably it must have been in the original markup? Were the typists instructed to make this distinction?

    The OED3 continues to make this distinction in new and revised entries, and they impose it on quotations, too. They must think it’s necessary, but I don’t understand why. What is it good for?

  • I’ve wondered about this – and the situation is even stranger in the 1989 text I have, which in the quot you cite looks like this:

    `I don't call it honouring the Sabbath to sit down to a worse dinner than on a work-a-day,' Jim remarked…

    The open-quotation mark is actually ASCII-96, “Grave Accent”, while both the apostrophe and close-quotation marks are ASCII-39 “Single Quote”.
    If I had to guess I’d say that the typists keyed in Grave Accent for open-quotes and single quotes for everything else because that’s what was available on their keyboards and in the standard ASCII set. At some later point the text was likely algorithmically revised to the Unicode curly open- and close- quotes you’re seeing in “Previous version”, but no one bothered to revise the apostrophes.
    To confirm this you’d have to find an anomalous artefact of the replacement algorithm in those “Previous version” pages, but I’m not aware any way to search these pages.

  • ktschwarz wrote:

    That 1989 text is just what would be expected given the limits of a standard keyboard, as you say (though since the typing was subcontracted to a specialized company, we don’t know for sure that they used standard keyboards). That’s not strange. The strange part is how they changed it afterward. How could it have been algorithmically revised, if the difference between apostrophes and close quotes wasn’t already coded? Algorithms are not that good. Can you show me an algorithm that that correctly converts this input to this output, in the definition of flesh, sense 4a:

    in recent use primarily suggesting `butchers' meat', not poultry, etc.
    in recent use primarily suggesting ‘butchers' meat’, not poultry, etc.

    … while also getting this conversion right under hollow meat?

    `poultry, rabbits, etc., any meat not sold by butchers'
    ‘poultry, rabbits, etc., any meat not sold by butchers’

    The online version has butchers' with a straight apostrophe in the first one and not the second, though they were both curly on the printed page — and this is not just one case that might have been hand-repaired, it’s throughout.

    That’s why I thought it was coded by the original typists; if it wasn’t them, humans must have been involved at some point. And *why*? Why change the “previous version” pages, which we expect to be faithful (up to the limits of online typography; they can’t reproduce some characters from the print version)? Why invest so much effort into an artificial distinction that wasn’t there in the source text?

    If you can get answers, I’d be interested in hearing them.

  • It’s a bit of a puzzle, I’ll grant you. I think the algorithmic part wouldn’t be too hard in the case you cite, and would be helped by the fact that “open quote” is already marked out – you could decide most “s'”es based on whether there’s a valid close quote after it. But still in a text this large and varied there should be some inputs to flummox an algorithm and I haven’t found any artefacts of that as of yet.
    I /have/ found a couple of odd cases in which “grave accent” got changed to single quote, however, e.g. in surnames between initial “M” and a capital when in author name fields (eg T. M`C. Anderson), but not when in quotation text (eg M`Donnell).

  • A look through dialect quots is turning up some further challenges, eg.:
    1891 J. Baron Blegburn Dickshonary 44 ‘Aw'm gooin' to meet mi Moll to-neet’ is a varra common sayin' wi' factory lads: some o' th' better soort say ‘woman’ i' th' place o' Moll, but nooan so mony.

  • ktschwarz wrote:

    You think the algorithm wouldn’t be too hard? Try to write it! You don’t have prior information about which close-quotes are valid. You don’t have prior information on whether a word ending in s’ is a plural possessive or an apocope, as in ‘Bes’ li’l’ poker player in this hull state. Wanna sit in, stranger?’.

    The problem could at least be narrowed down mechanically:
    1. Change close-quotes at the beginning of words to apostrophes, as they always represent clipping (apheresis), as in ’tis, ’60s, etc. (This only works because human typists have already coded the open-quotes separately!)
    2. Change close-quotes within words to apostrophes. That covers possessives, elisions, abbreviations, clitics, special-case plurals (X’s), and foreign words (pa’anga).
    3. Close-quotes at the end of words may be possessives, apocope, or actual close-quotes. If there’s no preceding open-quote, it’s safe to change them to apostrophes.
    4. Close-quotes that occur in a perfectly ordered sequence open/close/open/close/… should be left as is.
    5. Flag all remaining undecided close-quotes for human attention.

    But that’s still going to leave a lot of quotations, probably thousands, maybe tens of thousands. That’s too many to fix casually, it had to be planned. And where’s the benefit from all this work? It does *not* make any difference to the search function, which still does *not* distinguish apostrophes from close-quotes!

  • ktschwarz wrote:

    I think the change from T. M`C. Anderson to T. M'C. Anderson is a policy change on M' names. This Anderson is known to Wikipedia as Thomas McCall Anderson; his Treatise on Diseases of the Skin spells his name as “T. M’CALL ANDERSON, M.D.” on the title page, with a curly apostrophe/close-quote. The spelling of M' or Mc names has been very variable; sometimes a curly open-quote has been used perhaps because it looks a bit like a superscript c, but then you also see these names spelled with curly apostrophes, or straight ones, or with Mc. The OED has apparently now decided on M' with straight apostrophe for all such names. And the online version has simplified Anderson’s initials to just “T. M. Anderson”, even before full revision.

    Looks like they’ve done the same to some other M' names: e.g. under vacillating, the author’s name M‘Arthur (with open-quote) in previous print editions is now M'Arthur.

    What entry is M`Donnell in? Did you mean O’Donnell?

    If you want a really extreme apostrophe/quote challenge, check out the quotation from Evil-Eye Fleegle under whammy!

  • > What entry is M`Donnell in? Did you mean O’Donnell?

    s.v. accept, v (OED1 1884, OED2 1889):
    1882 Daily Tel. 17 May, (Cricket.) Leslie gave an easy chance to M‘Donnell at slip, which was not accepted.

    Quot suppressed in OED3 revision.

  • >The problem could at least be narrowed down mechanically:

    Adding some rules around punctuation would help considerably. (But not for my 1891 example, above).

  • PS Presumably you also turned up some examples of original “grave accent” used for other reasons, such as in transliterations, e.g.:
    “Written in a perfect Nasta`lîq, in four columns, with one gold and two ornamental rules.”
    These were (usually) open-quotes in printed editions and have been re-rendered as open-quote in digital OED2 etc.

  • ktschwarz wrote:

    “Quot suppressed in OED3 revision” — aha, thanks, that’s why I couldn’t find it. Yes, it looks like they applied a blanket change of M‘Foo names to M'Foo, but only in author names, they didn’t apply it to quotation text. Here’s another example that remains in OED online, not yet revised, under unsupported:

    1812 Duke of Wellington Dispatches (1837) IX. 349 Leaving behind them unprotected and unsupported the guns of Captain M‘Donald's troop.

    That’s a faithful transcription of the open-quote in the source (it’s on Hathitrust). They may change it to a straight apostrophe when they revise it. I don’t like that. The source should be quoted as is, as far as your typography can reproduce.

    Yes, the open quote (aka high-6, aka turned comma) has also been used for transliteration in the past, but is out of favor now. In that quotation under nastaliq, OED3 has made it unfaithful! They’ve changed the open quote, which is was what was used in the source (on Hathitrust), to U+02BF ʿ MODIFIER LETTER LEFT HALF RING. That is a modern transcription for the letter ʿayin, but it was not yet in use in 1908. Using the half ring is appropriate in the etymology, which was written in 2003, but quotations should be faithful, and I’m unhappy that they’re not.

  • Update: I’ve been following up with some contacts who were around in those days. My guess above was not correct: the retyped text used its own (pre unicode) escape system to code all special characters, so it came to UW with curly quotes. These were then changed to grave-straight according to the LaTeX standard. OUP would have used the typed text as a base for the print OED2. The straightening of apostrophes etc from close-curly could have happened from either the original curly or the UW grave-straight text — not that it would make a difference I don’t think. Still not sure about the process implemented to achieve it. It may well have been a combination of mechanical (for certain cases) + manual (for ambiguous cases) as you describe above.

  • ktschwarz wrote:

    Thanks very much for following up! Your contacts do confirm that close-curly single quotes were not distinguished from apostrophes by the 1980s typists in any way, right?

    I’m still wondering when they decided to make that distinction, and why. You have a version saved from several years ago, don’t you? Was it there already? The distinction is there in the 2017 version of transgender that you previously posted (curly single quotes in the word ‘transgender’ in the 1983 quotation, straight apostrophe in Bruce Laker's in the 1984 quotation).

  • Not in those terms exactly, but it’s clear that curly quotes were coded at the time of typing. Straight apostrophes might have been used in e.g. pronunciations, but the typists themselves would not have changed curly marks in print OED1 or SUP2 to straight. Details on encoding can be found here: https://cs.uwaterloo.ca/research/tr/1986/CS-86-20.pdf

    I think the answer to when is maybe in the lead up to the 1992 CD-ROM (or perhaps a subsequent release) and certainly before 2000. I don’t have the 1992 ROM text per se but the “Previous Version (1989)” text still available on /OED Online/ should be pretty close to what came out then. My 2004 OED2 CD ROM does curly/straight in a text that is otherwise nearly identical to the 1989 UW file (it includes draft additions at the bottom of entries).

Leave a Reply

Your email is never shared.Required fields are marked *