I talked with Wayne Jones the other day about my work on the Oxford English Dictionary. The result was this short piece on my plans for an updated OED bibliography and Variorum:
-
Recent Posts
-
Recent Comments
-
Archives
- December 2024
- July 2024
- May 2024
- November 2023
- July 2023
- June 2023
- May 2023
- November 2022
- July 2022
- February 2022
- December 2021
- March 2021
- December 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- March 2020
- February 2020
- January 2020
- December 2019
- October 2019
- July 2019
- June 2019
- September 2018
- July 2018
- May 2018
- April 2018
- March 2018
- January 2018
- October 2017
- August 2017
- May 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- July 2016
- June 2016
- April 2016
- March 2016
- February 2016
- January 2016
- November 2015
- October 2015
- September 2015
- August 2015
- June 2015
- May 2015
- February 2015
- January 2015
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
-
Categories
-
Meta
Thanks for the update! Do I understand correctly that what you got on the thumb drive was the source files from which the OED2 was created, with all the markup tags? The pages that it now shows at the “Previous version: OED2 (1989)” links were presumably created from those files, translated into current HTML. I have been wondering about those pages, since they are visibly different from the printed OED2 in the treatment of apostrophes: in the web version, apostrophes are represented by straight marks ('), while closing single quote marks are curly (’ aka high-9). For example, note the difference between the apostrophe in don't and the quotation marks in this quotation under argumentatively:
This difference is not visible in the printed OED2, which represented both by curly marks. But since it’s in the web pages, presumably it must have been in the original markup? Were the typists instructed to make this distinction?
The OED3 continues to make this distinction in new and revised entries, and they impose it on quotations, too. They must think it’s necessary, but I don’t understand why. What is it good for?
I’ve wondered about this – and the situation is even stranger in the 1989 text I have, which in the quot you cite looks like this:
The open-quotation mark is actually ASCII-96, “Grave Accent”, while both the apostrophe and close-quotation marks are ASCII-39 “Single Quote”.
If I had to guess I’d say that the typists keyed in Grave Accent for open-quotes and single quotes for everything else because that’s what was available on their keyboards and in the standard ASCII set. At some later point the text was likely algorithmically revised to the Unicode curly open- and close- quotes you’re seeing in “Previous version”, but no one bothered to revise the apostrophes.
To confirm this you’d have to find an anomalous artefact of the replacement algorithm in those “Previous version” pages, but I’m not aware any way to search these pages.
That 1989 text is just what would be expected given the limits of a standard keyboard, as you say (though since the typing was subcontracted to a specialized company, we don’t know for sure that they used standard keyboards). That’s not strange. The strange part is how they changed it afterward. How could it have been algorithmically revised, if the difference between apostrophes and close quotes wasn’t already coded? Algorithms are not that good. Can you show me an algorithm that that correctly converts this input to this output, in the definition of flesh, sense 4a:
… while also getting this conversion right under hollow meat?
The online version has butchers' with a straight apostrophe in the first one and not the second, though they were both curly on the printed page — and this is not just one case that might have been hand-repaired, it’s throughout.
That’s why I thought it was coded by the original typists; if it wasn’t them, humans must have been involved at some point. And *why*? Why change the “previous version” pages, which we expect to be faithful (up to the limits of online typography; they can’t reproduce some characters from the print version)? Why invest so much effort into an artificial distinction that wasn’t there in the source text?
If you can get answers, I’d be interested in hearing them.
It’s a bit of a puzzle, I’ll grant you. I think the algorithmic part wouldn’t be too hard in the case you cite, and would be helped by the fact that “open quote” is already marked out – you could decide most “s'”es based on whether there’s a valid close quote after it. But still in a text this large and varied there should be some inputs to flummox an algorithm and I haven’t found any artefacts of that as of yet.
I /have/ found a couple of odd cases in which “grave accent” got changed to single quote, however, e.g. in surnames between initial “M” and a capital when in author name fields (eg T. M`C. Anderson), but not when in quotation text (eg M`Donnell).
A look through dialect quots is turning up some further challenges, eg.:
1891 J. Baron Blegburn Dickshonary 44 ‘Aw'm gooin' to meet mi Moll to-neet’ is a varra common sayin' wi' factory lads: some o' th' better soort say ‘woman’ i' th' place o' Moll, but nooan so mony.
You think the algorithm wouldn’t be too hard? Try to write it! You don’t have prior information about which close-quotes are valid. You don’t have prior information on whether a word ending in s’ is a plural possessive or an apocope, as in ‘Bes’ li’l’ poker player in this hull state. Wanna sit in, stranger?’.
The problem could at least be narrowed down mechanically:
1. Change close-quotes at the beginning of words to apostrophes, as they always represent clipping (apheresis), as in ’tis, ’60s, etc. (This only works because human typists have already coded the open-quotes separately!)
2. Change close-quotes within words to apostrophes. That covers possessives, elisions, abbreviations, clitics, special-case plurals (X’s), and foreign words (pa’anga).
3. Close-quotes at the end of words may be possessives, apocope, or actual close-quotes. If there’s no preceding open-quote, it’s safe to change them to apostrophes.
4. Close-quotes that occur in a perfectly ordered sequence open/close/open/close/… should be left as is.
5. Flag all remaining undecided close-quotes for human attention.
But that’s still going to leave a lot of quotations, probably thousands, maybe tens of thousands. That’s too many to fix casually, it had to be planned. And where’s the benefit from all this work? It does *not* make any difference to the search function, which still does *not* distinguish apostrophes from close-quotes!
I think the change from T. M`C. Anderson to T. M'C. Anderson is a policy change on M' names. This Anderson is known to Wikipedia as Thomas McCall Anderson; his Treatise on Diseases of the Skin spells his name as “T. M’CALL ANDERSON, M.D.” on the title page, with a curly apostrophe/close-quote. The spelling of M' or Mc names has been very variable; sometimes a curly open-quote has been used perhaps because it looks a bit like a superscript c, but then you also see these names spelled with curly apostrophes, or straight ones, or with Mc. The OED has apparently now decided on M' with straight apostrophe for all such names. And the online version has simplified Anderson’s initials to just “T. M. Anderson”, even before full revision.
Looks like they’ve done the same to some other M' names: e.g. under vacillating, the author’s name M‘Arthur (with open-quote) in previous print editions is now M'Arthur.
What entry is M`Donnell in? Did you mean O’Donnell?
If you want a really extreme apostrophe/quote challenge, check out the quotation from Evil-Eye Fleegle under whammy!