Requiring typed text to be NFKC (was: Can NFKC turn valid UAX 31 identifiers into non-identifiers?)

2018-06-05 Thread Manish Goregaokar via Unicode
Following up from my previous email , one of the ideas that was brought up was that if we're going to consider NFKC forms equivalent, we should require things to be typed in NFKC. I'm a bit wary of this. As Richard brought up in th

Re: Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Manish Goregaokar via Unicode
Oh, looks like UAX 31 has info on how to be closed under NFC http://www.unicode.org/reports/tr31/#NFKC_Modifications -Manish On Mon, Jun 4, 2018 at 12:49 PM Manish Goregaokar wrote: > Hi, > > The Rust community is considering > adding non-ascii >

Can NFKC turn valid UAX 31 identifiers into non-identifiers?

2018-06-04 Thread Manish Goregaokar via Unicode
Hi, The Rust community is considering adding non-ascii identifiers, which follow UAX #31 (XID_Start XID_Continue*, with tweaks). The proposal also asks for identifiers to be treated as equivalent under NFKC. Are

Re: Submissions open for 2020 Emoji

2018-04-20 Thread Manish Goregaokar via Unicode
It would also be useful if "Added to larger set" mentioned which proposal it was added to. Last December I proposed emojification for U+1F58E LEFT WRITING HAND, and that's marked as merged but it's unclear which proposal it was merged with. (Also the document isn't on L2 yet, I'm not sure why) T

Re: Unicode of Death 2.0

2018-02-18 Thread Manish Goregaokar via Unicode
Oh, also vatu. Seems like that ordering algorithm is indeed relevant. -Manish On Sat, Feb 17, 2018 at 11:57 PM, Manish Goregaokar wrote: > Ah, looking at that the OpenType `pstf` feature seems relevant, though I > cannot get it to crash with Gurmukhi (where the consonant ya is a postform) > >

Re: Unicode of Death 2.0

2018-02-18 Thread Manish Goregaokar via Unicode
Ah, looking at that the OpenType `pstf` feature seems relevant, though I cannot get it to crash with Gurmukhi (where the consonant ya is a postform) -Manish On Sat, Feb 17, 2018 at 4:40 PM, Philippe Verdy wrote: > An interesting read: > > https://docs.microsoft.com/fr-fr/typography/script- > de

Re: Unicode of Death 2.0

2018-02-17 Thread Manish Goregaokar via Unicode
Heh, I wasn't aware of the word "phala-form", though that seems Bengali-specific? Interesting observation about the vowel glyphs, I'll mention this in the post. Initially I missed this because I hadn't realized that the bengali o vowel crashed (which made me discount this). Thanks! -Manish On

Re: Unicode of Death 2.0

2018-02-16 Thread Manish Goregaokar via Unicode
FWIW I dissected the crashing strings, it's basically all sequences in Telugu, Bengali, Devanagari where the consonant is suffix-joining (ra in Devanagari, jo and ro in Bengali, and all Telugu consonants), the vowel is not Bengali au or o / Telugu ai, and if the second consonant is ra/ro the first

Re: Emoji’s

2018-01-11 Thread Manish Goregaokar via Unicode
I submitted a proposal to emojify the left writing hand code point. -Manish On Thu, Jan 11, 2018 at 5:00 PM, Christoph Päper via Unicode < unicode@unicode.org> wrote: > jillian mestel: > > > > I was very disappointed to learn that there are no emojis of portraying > a dominant left hand. > > See

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Manish Goregaokar via Unicode
planning to get rid of the GAZ/EBG distinction ( >> http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. >> >> Mark >> >> On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < >> unicode@unicode.org> wrote: >> >>> On

Re: Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2018-01-02 Thread Manish Goregaokar via Unicode
tml#UTS51>]. > *and not* GCB = Virama > > Note: we are already planning to get rid of the GAZ/EBG distinction ( > http://www.unicode.org/reports/tr29/tr29-32.html#GB10) in any event. > > Mark > > On Mon, Jan 1, 2018 at 3:52 PM, Richard Wordingham via Unicode < > unicode

Unifying E_Modifier and Extend in UAX 29 (i.e. the necessity of GB10)

2017-12-31 Thread Manish Goregaokar via Unicode
In UAX 29, the GB10 rule[1] (and the WB14 rule[2]) states that we should not break before E_modifier characters in case it is after an emoji base (with optional Extend characters in between) Given that the spec is allowed to ignore degenerates, is there any value lost by merging E_Modifier and Ext

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-21 Thread Manish Goregaokar via Unicode
> When deleting by backspace, the usual practice is to delete one Unicode character for each key press. This seems to depend on the operating system and program involved. For example, on OSX any native text input field (Spotlight, TextEdit, etc) will delete by extended grapheme cluster. Chrome als

Re: Proposed Expansion of Grapheme Clusters to Whole Aksharas - Implementation Issues

2017-12-10 Thread Manish Goregaokar via Unicode
> GB9c: (Virama | ZWJ ) × Extend* LinkingConsonant You can also explicitly request ligatureification with a ZWJ, so perhaps this rule should be something like (Virama ZWJ? | ZWJ) x Extend* LinkingConsonant -Manish On Sat, Dec 9, 2017 at 7:16 AM, Mark Davis ☕️ via Unicode < unicode@unicode.org

Re: Counting Devanagari Aksharas

2017-04-22 Thread Manish Goregaokar via Unicode
> You cannot even > meaningfully move by single characters in most clusters, because > composing characters generally completely changes how the original > characters looked, so there's nowhere you can display the cursor. Yes, and this is one of the reasons it feels broken in devanagari, you get c

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
. Breaking on ZWNJ seems sensible. -Manish On Fri, Apr 21, 2017 at 4:04 PM, Richard Wordingham via Unicode wrote: > On Thu, 20 Apr 2017 11:17:05 -0700 > Manish Goregaokar via Unicode wrote: > >> On Wed, Apr 19, 2017 at 4:35 PM, Richard Wordingham via Unicode >> wrote: >

Re: Counting Devanagari Aksharas

2017-04-21 Thread Manish Goregaokar via Unicode
That seems like a relatively niche use case (especially with Vedic Sanskrit) compared to having weird selection for everything else. I'm not convinced. When I use a romanized Devanagari input method (I typically do on my laptop), deleting the whole cluster is necessary anyway for things to work wel

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
don't see how intra-conjunct selection would be useful otherwise. -Manish On Thu, Apr 20, 2017 at 12:14 PM, Richard Wordingham via Unicode wrote: > On Thu, 20 Apr 2017 11:17:05 -0700 > Manish Goregaokar via Unicode wrote: > >> When given a rendered representation peopl

Re: Counting Devanagari Aksharas

2017-04-20 Thread Manish Goregaokar via Unicode
I don't think there's consensus. When given a rendered representation people seem to uniformly count conjuncts as multiple aksharas if rendered with visible halant, and as a single akshara if they are rendered conjoined. Most fonts for devanagari these days are pretty good at conjoining consonant