On 1/25/2019 9:39 AM, James Tauber via Unicode wrote:
Thank you, although the word break does still affect things like double-clicking to select.

And people do seem to want to use U+02BC for this reason (and I'm trying to articulate why that isn't what U+02BC is meant for).

For normal edition operations, breaking selection for "d'Artagnan" or "can't" into two is overly fussy.

No wonder people get frustrated.

A./

James

On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ <m...@macchiato.com> wrote:
U+2019 is normally the character used, except where the ’ is considered a letter. When it is between letters it doesn't cause a word break, but because it is also a right single quote, at the end of words there is a break. Thus in a phrase like «tryin’ to go» there is a word break after the n, because one can't tell.

So something like "δ’ αρχαια" (picking a phrase at random) would have a word break after the delta. 

Word break: 
δ αρχαια 

However, there is no line break between them (which is the more important operation in normal usage). Probably not worth tailoring the word break.

Line break:
δ’ αρχαια 

Mark


On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode <unicode@unicode.org> wrote:
There seems some debate amongst digital classicists in whether to use U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking elision. (e.g. δ’ for δέ preceding a word starting with a vowel).

It seems to me that U+2019 is the technically correct choice per the Unicode Standard but it is not without at least one problem: default word breaking rules.

I'm trying to provide guidelines for digital classicists in this regard.

Is it correct to say the following:

1) U+2019 is the correct character to use for the apostrophe in Ancient Greek when marking elision. 
2) U+02BC is a misuse of a modifier for this purpose
3) However, use of U+2019 (unlike U+02BC) means the default Word Boundary Rules in UAX#29 will (incorrectly) exclude the apostrophe from the word token
4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules in UAX#29 will (incorrectly) include the apostrophe as part of a glyph cluster with the previous letter
5) The correct solution is to tailor the Word Boundary Rules in the case of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't have the same ambiguity problems with the single quotation mark as in English as it should not be used as a quotation mark in Ancient Greek)

Many thanks in advance.

James


--
James Tauber
Greek Linguistics: https://jktauber.com/
Digital Tolkien: https://digitaltolkien.com/

Twitter: @jtauber


Reply via email to