We had that originally, but some people objected that some languages (Arabic, as I recall) can end a string of letters with a ZWJ, and immediately follow it by an emoji (without an intervening space) without wanting it to be joined into a grapheme cluster with a following symbol. While I personally consider that a degenerate case, we tightened the definition to prevent that.
Mark Mark On Tue, Jan 2, 2018 at 10:41 AM, Manish Goregaokar <[email protected]> wrote: > In the current draft GB11 mentions Extended_Pictographic Extend* ZWJ x > Extended_Pictographic. > > Can this similarly be distilled to just ZWJ x Extended_Pictographic? This > does affect cases like <indic letter, virama, ZWJ, emoji> or <arabic > letter, zwj, emoji> and I'm not certain if that counts as a degenerate > case. If we do this then all of the rules except the flag emoji one become > things which can be easily calculated with local information, which is nice > for implementors. > > (Also in the current draft I think GB11 needs a `E_Modifier?` somewhere > but if we merge that with Extend that's not going to be necessary anyway) > > -Manish > > On Tue, Jan 2, 2018 at 3:02 PM, Manish Goregaokar <[email protected]> > wrote: > >> > Note: we are already planning to get rid of the GAZ/EBG distinction ( >> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >> >> >> This is great! I hadn't noticed this when I last saw that draft (I was >> focusing on the Virama stuff). Good to know! >> >> >> > Instead, we'd add one line to >> *Extend <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:* >> >> Yeah, this is essentially what I was hoping we could do. >> >> Is there any way to formally propose this? Or is bringing it up here good >> enough? >> >> Thanks, >> >> -Manish >> >> On Mon, Jan 1, 2018 at 9:17 PM, Mark Davis ☕️ via Unicode < >> [email protected]> wrote: >> >>> This is an interesting suggestion, Manish. >>> >>> <non-emoji-base, skin tone modifier> is a degenerate case, so if we >>> following your suggestion we also could drop E_Base and E_Modifier, and >>> rule GB10. >>> >>> Instead, we'd add one line to *Extend >>> <http://www.unicode.org/reports/tr29/tr29-32.html#Extend>:* >>> >>> OLD >>> Grapheme_Extend = Yes >>> *and not* GCB = Virama >>> >>> NEW >>> Grapheme_Extend = Yes, or >>> Emoji characters listed as Emoji_Modifier=Yes in emoji-data.txt. See [ >>> UTS51 <http://www.unicode.org/reports/tr41/tr41-21.html#UTS51>]. >>> *and not* GCB = Virama >>> >>> Note: we are already planning to get rid of the GAZ/EBG distinction ( >>> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >>> >>> Mark >>> >>> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >>> [email protected]> wrote: >>> >>>> On Mon, 1 Jan 2018 13:24:29 +0530 >>>> Manish Goregaokar via Unicode <[email protected]> wrote: >>>> >>>> > <random non-emoji, skin tone modifier> sounds very much like a >>>> > degenerate case to me. >>>> >>>> Generally yes, but I'm not sure that they'd be inappropriate for >>>> Egyptian hieroglyphs showing human beings. The choice of determinative >>>> can convey unpronounceable semantic information, though I'm not sure >>>> that that can be as sensitive as skin colour. However, in such a case >>>> it would also be appropriate to give a skin tone modifier the property >>>> Extend. >>>> >>>> Richard. >>>> >>> >>> >> >

