On Sun, Jan 27, 2019 at 1:22 PM Richard Wordingham via Unicode < unicode@unicode.org> wrote:
> Except the Uniocde-compliant processes aren't required to follow the > scheme of TR27 Unicode Text Segmentation. However, it is only required > to select the whole word because the U+2019 is followed by a letter. > TR27 prescribes different behaviour for "dogs'" with U+2019 (interpret > as two 'words') and U+02BC (interpret as one word). The GTK-based > email client I'm using has that difference, but also fails with > "don't" unless one uses U+02BC. > > However LibreOffice treats "don't" as a single word for U+0027, U+02BC > and U+2019, but "dogs'" as a single word only for U+02BC. This > complies with TR27. I'm not surprised, as LibreOffice does use or has > used ICU. > This comes back to my original question that started this thread. Many people creating Ancient Greek digital resources use U+02BC seemingly because of incorrect word-breaking with *word-final* U+2019 (which is the only time it occurs in Ancient Greek and always marking elision, never as the end of a quotation). I am trying to write guidelines as to why they should use U+2019. I'm convinced it's technically the right code point to use but am wanting to get my facts straight about how to address the word-breaking issue (specifically for word-final U+2019 in Ancient Greek, to be clear). In my original post, I asked if a language-specific tailoring of the text segmentation algorithm was the solution but no one here has agreed so far. Here's a concrete example from Smyth's Grammar: γένοιτ’ ἄν Double-clicking on the first word should select the U+2019 as well. Interestingly on macOS Mojave it does in Pages[1] but not in Notes, the Terminal or here in Gmail on Chrome. To be clear: when I say "should" I mean that that is the expectation classicists have and the failure to meet it is why some of them insist on using U+02BC. I'm happy if the answer is "use U+2019 and go get your text segmentation implementations fixed"[2] but am looking for confirmation of that. James [1] To be honest, I was impressed Pages got it right. [2] In the same spirit as "if certain combining character combinations don't work, the solution is not to add precomposed characters, it's to improve the fonts" or "tonos and oxia are the same and if they look different, it's the fault of your font".