Re: Ancient Greek apostrophe marking elision

James Kass via Unicode Sun, 27 Jan 2019 19:53:17 -0800


On 2019-01-27 11:38 PM, Richard Wordingham via Unicode wrote:

On Sun, 27 Jan 2019 19:57:37 +0000
James Kass via Unicode <unicode@unicode.org> wrote:

On 2019-01-27 7:09 PM, James Tauber via Unicode wrote:

In my original post, I asked if a language-specific tailoring of
the text segmentation algorithm was the solution but no one here
has agreed so far.

If there are likely to be many languages requiring exceptions to the
segmentation algorithm wrt U+2019, then perhaps it would be better to
establish conventions using ZWJ/ZWNJ and adjust the algorithm
accordingly so that it would be cross-languages.  (Rather than
requiring additional and open ended language-specific tailorings.) (I
inserted several combinations of ZWJ/ZWNJ into James Tauber's
example, but couldn't improve the segmentation in LibreOffice,
although it was possible to make it worse.)

If you look at TR29, you will see that ZWJ should only affect word
boundaries for emoji.  ZWNJ shall have no effect.  What you want is a
control that joins words, but we don't have that.

Richard.


(https://unicode.org/reports/tr29/)

It’s been said that the text segmentation rules seem over-complicatedand are probably non-trivial to implement properly. I tried yoursuggestion of WORD JOINER U+2060 after tau ( γένοιτ⁠’ ἄν ), but it onlyadded yet another word break in LibreOffice.

The problem may stem from the fact that WORD JOINER is supposed to betreated as though it were a zero-width no-break space. IOW it is a*space*, and as a space it indicates a word break. That doesn’t seem right.

Instead of treating WORD JOINER as a SPACE, why not treat it as a WORDJOINER? It could save a lot of problems wrt undesirable stringsegmentation in addition to possibly minimizing future language-specifictailoring and easing the burden on implementers.

Re: Ancient Greek apostrophe marking elision

Reply via email to