> breaking selection for "d'Artagnan" or "can't" into two is overly fussy.
True, and that is not what U+2019 does; it does not break medially. Mark On Fri, Jan 25, 2019 at 11:07 PM Asmus Freytag via Unicode < unicode@unicode.org> wrote: > On 1/25/2019 9:39 AM, James Tauber via Unicode wrote: > > Thank you, although the word break does still affect things like > double-clicking to select. > > And people do seem to want to use U+02BC for this reason (and I'm trying > to articulate why that isn't what U+02BC is meant for). > > For normal edition operations, breaking selection for "d'Artagnan" or > "can't" into two is overly fussy. > > No wonder people get frustrated. > > A./ > > James > > On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ <m...@macchiato.com> wrote: > >> U+2019 is normally the character used, except where the ’ is considered a >> letter. When it is between letters it doesn't cause a word break, but >> because it is also a right single quote, at the end of words there is a >> break. Thus in a phrase like «tryin’ to go» there is a word break after the >> n, because one can't tell. >> >> So something like "δ’ αρχαια" (picking a phrase at random) would have a >> word break after the delta. >> >> Word break: >> δ’ αρχαια >> >> However, there is no *line break* between them (which is the more >> important operation in normal usage). Probably not worth tailoring the word >> break. >> >> Line break: >> δ’ αρχαια >> >> Mark >> >> >> On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode < >> unicode@unicode.org> wrote: >> >>> There seems some debate amongst digital classicists in whether to use >>> U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking >>> elision. (e.g. δ’ for δέ preceding a word starting with a vowel). >>> >>> It seems to me that U+2019 is the technically correct choice per the >>> Unicode Standard but it is not without at least one problem: default word >>> breaking rules. >>> >>> I'm trying to provide guidelines for digital classicists in this regard. >>> >>> Is it correct to say the following: >>> >>> 1) U+2019 is the correct character to use for the apostrophe in Ancient >>> Greek when marking elision. >>> 2) U+02BC is a misuse of a modifier for this purpose >>> 3) However, use of U+2019 (unlike U+02BC) means the default Word >>> Boundary Rules in UAX#29 will (incorrectly) exclude the apostrophe from the >>> word token >>> 4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules >>> in UAX#29 will (incorrectly) include the apostrophe as part of a glyph >>> cluster with the previous letter >>> 5) The correct solution is to tailor the Word Boundary Rules in the case >>> of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't >>> have the same ambiguity problems with the single quotation mark as in >>> English as it should not be used as a quotation mark in Ancient Greek) >>> >>> Many thanks in advance. >>> >>> James >>> >> > > -- > *James Tauber* > Greek Linguistics: https://jktauber.com/ > Music Theory: https://modelling-music.com/ > Digital Tolkien: https://digitaltolkien.com/ > > Twitter: @jtauber > > >