OED Work on “Writing and Editing” Podcast

I talked with Wayne Jones the other day about my work on the Oxford English Dictionary. The result was this short piece on my plans for an updated OED bibliography and Variorum:

182. Enhancing the Oxford English Dictionary

7 Comments

  • kts wrote:

    Thanks for the update! Do I understand correctly that what you got on the thumb drive was the source files from which the OED2 was created, with all the markup tags? The pages that it now shows at the “Previous version: OED2 (1989)” links were presumably created from those files, translated into current HTML. I have been wondering about those pages, since they are visibly different from the printed OED2 in the treatment of apostrophes: in the web version, apostrophes are represented by straight marks ('), while closing single quote marks are curly (’ aka high-9). For example, note the difference between the apostrophe in don't and the quotation marks in this quotation under argumentatively:

    ‘I don't call it honouring the Sabbath to sit down to a worse dinner than on a work-a-day,’ Jim remarked argumentatively.

    This difference is not visible in the printed OED2, which represented both by curly marks. But since it’s in the web pages, presumably it must have been in the original markup? Were the typists instructed to make this distinction?

    The OED3 continues to make this distinction in new and revised entries, and they impose it on quotations, too. They must think it’s necessary, but I don’t understand why. What is it good for?

  • I’ve wondered about this – and the situation is even stranger in the 1989 text I have, which in the quot you cite looks like this:

    `I don't call it honouring the Sabbath to sit down to a worse dinner than on a work-a-day,' Jim remarked…

    The open-quotation mark is actually ASCII-96, “Grave Accent”, while both the apostrophe and close-quotation marks are ASCII-39 “Single Quote”.
    If I had to guess I’d say that the typists keyed in Grave Accent for open-quotes and single quotes for everything else because that’s what was available on their keyboards and in the standard ASCII set. At some later point the text was likely algorithmically revised to the Unicode curly open- and close- quotes you’re seeing in “Previous version”, but no one bothered to revise the apostrophes.
    To confirm this you’d have to find an anomalous artefact of the replacement algorithm in those “Previous version” pages, but I’m not aware any way to search these pages.

  • ktschwarz wrote:

    That 1989 text is just what would be expected given the limits of a standard keyboard, as you say (though since the typing was subcontracted to a specialized company, we don’t know for sure that they used standard keyboards). That’s not strange. The strange part is how they changed it afterward. How could it have been algorithmically revised, if the difference between apostrophes and close quotes wasn’t already coded? Algorithms are not that good. Can you show me an algorithm that that correctly converts this input to this output, in the definition of flesh, sense 4a:

    in recent use primarily suggesting `butchers' meat', not poultry, etc.
    in recent use primarily suggesting ‘butchers' meat’, not poultry, etc.

    … while also getting this conversion right under hollow meat?

    `poultry, rabbits, etc., any meat not sold by butchers'
    ‘poultry, rabbits, etc., any meat not sold by butchers’

    The online version has butchers' with a straight apostrophe in the first one and not the second, though they were both curly on the printed page — and this is not just one case that might have been hand-repaired, it’s throughout.

    That’s why I thought it was coded by the original typists; if it wasn’t them, humans must have been involved at some point. And *why*? Why change the “previous version” pages, which we expect to be faithful (up to the limits of online typography; they can’t reproduce some characters from the print version)? Why invest so much effort into an artificial distinction that wasn’t there in the source text?

    If you can get answers, I’d be interested in hearing them.

  • It’s a bit of a puzzle, I’ll grant you. I think the algorithmic part wouldn’t be too hard in the case you cite, and would be helped by the fact that “open quote” is already marked out – you could decide most “s'”es based on whether there’s a valid close quote after it. But still in a text this large and varied there should be some inputs to flummox an algorithm and I haven’t found any artefacts of that as of yet.
    I /have/ found a couple of odd cases in which “grave accent” got changed to single quote, however, e.g. in surnames between initial “M” and a capital when in author name fields (eg T. M`C. Anderson), but not when in quotation text (eg M`Donnell).

  • A look through dialect quots is turning up some further challenges, eg.:
    1891 J. Baron Blegburn Dickshonary 44 ‘Aw'm gooin' to meet mi Moll to-neet’ is a varra common sayin' wi' factory lads: some o' th' better soort say ‘woman’ i' th' place o' Moll, but nooan so mony.

  • ktschwarz wrote:

    You think the algorithm wouldn’t be too hard? Try to write it! You don’t have prior information about which close-quotes are valid. You don’t have prior information on whether a word ending in s’ is a plural possessive or an apocope, as in ‘Bes’ li’l’ poker player in this hull state. Wanna sit in, stranger?’.

    The problem could at least be narrowed down mechanically:
    1. Change close-quotes at the beginning of words to apostrophes, as they always represent clipping (apheresis), as in ’tis, ’60s, etc. (This only works because human typists have already coded the open-quotes separately!)
    2. Change close-quotes within words to apostrophes. That covers possessives, elisions, abbreviations, clitics, special-case plurals (X’s), and foreign words (pa’anga).
    3. Close-quotes at the end of words may be possessives, apocope, or actual close-quotes. If there’s no preceding open-quote, it’s safe to change them to apostrophes.
    4. Close-quotes that occur in a perfectly ordered sequence open/close/open/close/… should be left as is.
    5. Flag all remaining undecided close-quotes for human attention.

    But that’s still going to leave a lot of quotations, probably thousands, maybe tens of thousands. That’s too many to fix casually, it had to be planned. And where’s the benefit from all this work? It does *not* make any difference to the search function, which still does *not* distinguish apostrophes from close-quotes!

  • ktschwarz wrote:

    I think the change from T. M`C. Anderson to T. M'C. Anderson is a policy change on M' names. This Anderson is known to Wikipedia as Thomas McCall Anderson; his Treatise on Diseases of the Skin spells his name as “T. M’CALL ANDERSON, M.D.” on the title page, with a curly apostrophe/close-quote. The spelling of M' or Mc names has been very variable; sometimes a curly open-quote has been used perhaps because it looks a bit like a superscript c, but then you also see these names spelled with curly apostrophes, or straight ones, or with Mc. The OED has apparently now decided on M' with straight apostrophe for all such names. And the online version has simplified Anderson’s initials to just “T. M. Anderson”, even before full revision.

    Looks like they’ve done the same to some other M' names: e.g. under vacillating, the author’s name M‘Arthur (with open-quote) in previous print editions is now M'Arthur.

    What entry is M`Donnell in? Did you mean O’Donnell?

    If you want a really extreme apostrophe/quote challenge, check out the quotation from Evil-Eye Fleegle under whammy!

Leave a Reply

Your email is never shared.Required fields are marked *