An interesting read: https://docs.microsoft.com/fr-fr/typography/script-development/bengali#reor
2018-02-18 1:30 GMT+01:00 Philippe Verdy <[email protected]>: > My opinion about this bug is that Apple's text renderer dynamically > allocates a glyphs buffer only when needed (lazily), but a test is missing > for the lazy construction of this buffer (which is not needed for most > texts not needing glyph substitutions or reordering when a single accessor > from the code point can find the glyph data directly by lookup in font > tables) and this is causing a null pointer exception at run time. > > The bug occurs effectively when processing the vowel that occurs after the > ZWNJ, if the code assumes that there's a glyphs buffer already constructed > for the cluster, in order to place the vowel over the correct glyph (which > may have been reordered in that buffer). > > Microsoft's text renderer, or other engines use do not delay the > constructiuon of the glyphs buffer, which can be reused for processing the > rest of the text, provided it is correctly reset after processing a cluster. > > > 2018-02-17 21:54 GMT+01:00 Manish Goregaokar <[email protected]>: > >> Heh, I wasn't aware of the word "phala-form", though that seems >> Bengali-specific? >> >> Interesting observation about the vowel glyphs, I'll mention this in the >> post. Initially I missed this because I hadn't realized that the bengali o >> vowel crashed (which made me discount this). >> >> >> Thanks! >> >> -Manish >> >> On Sat, Feb 17, 2018 at 12:22 PM, Philippe Verdy <[email protected]> >> wrote: >> >>> I would have liked that your invented term of "left-joining consonants" >>> took the usual name "phala forms" (to represent RA or JA/JO after a virama, >>> generally named "raphala" or "japhala/jophala"). >>> >>> And why this bug does not occur with some vowels is because these are >>> vowels in two parts, that are first decomposed into two separate glyphs >>> reordered in the buffer of glyphs, while other vowels do not need this >>> prior mapping and keep their initial direct mapping from their codepoints >>> in fonts, which means that this has to do to the way the ZWNJ looks for the >>> glyphs of the vowels in the glyphs buffer and not in the initial codepoints >>> buffer: there's some desynchronization, and more probably an uninitialized >>> data field (for the lookup made in handling ZWNJ) if no vowel decomposition >>> was done (the same data field is correctly initialized when it is the first >>> consonnant which takes an alternate form before a virama, like in most >>> Indic consonnant clusters, because the a glyph buffer is created. >>> >>> Now we have some hints about why the bug does not occur in Kannada or >>> Khmer: a glyph buffer is always created, but there was some shortcut made >>> in Devanagari, Bengali, and Telugu to allow processing clusters faster >>> without having to create always a gyphs buffer (to allow reordering glyphs >>> before positioning them), and working directly on the codepoints streams. >>> >>> So it seems related to the fact that OpenType fonts do not need to >>> include rules for glyph substitution, but the PHALA forms are represented >>> without any glyph substitution, by mapping directly the phala forms in a >>> separate table for the consonants. Because there's been no code to glyph >>> subtitution, the glyph buffer is not created, but then when processing the >>> ZWNJ, it looks for data in a glyph buffer that has still not be initialized >>> (and this is specific to the renderers implemented by Apple in iOS and >>> MacOS). This bug does not occur if another text rendering engine is used >>> (e.g. in non-Apple web browsers). >>> >>> >>> 2018-02-16 19:44 GMT+01:00 Manish Goregaokar <[email protected]>: >>> >>>> FWIW I dissected the crashing strings, it's basically all <consonant, >>>> virama, consonant, zwnj, vowel> sequences in Telugu, Bengali, Devanagari >>>> where the consonant is suffix-joining (ra in Devanagari, jo and ro in >>>> Bengali, and all Telugu consonants), the vowel is not Bengali au or o / >>>> Telugu ai, and if the second consonant is ra/ro the first one is not also >>>> ra/ro (or ro-with-line-through-it). >>>> >>>> https://manishearth.github.io/blog/2018/02/15/picking-apart- >>>> the-crashing-ios-string/ >>>> >>>> -Manish >>>> >>>> On Thu, Feb 15, 2018 at 10:58 AM, Philippe Verdy via Unicode < >>>> [email protected]> wrote: >>>> >>>>> That's probably not a bug of Unicode but of MacOS/iOS text renderers >>>>> with some fonts using advanced composition feature. >>>>> >>>>> Similar bugs could as well the new advanced features added in Windows >>>>> or Android to support multicolored emojis, variable fonts, contextual >>>>> glyph >>>>> transforms, style variants, or more font formats (not just OpenType); the >>>>> bug may also be in the graphic renderer (incorrect clipping when drawing >>>>> the glyph into the glyph cache, with buffer overflows possibly caused by >>>>> incorrectly computed splines), and it could be in the display driver (or >>>>> in >>>>> the hardware accelerator having some limitations on the compelxity of >>>>> multipolygons to fill and to antialias), causing some infinite recursion >>>>> loop, or too deep recursion exhausting the stack limit; >>>>> >>>>> Finally the bug could be in the OpenType hinting engine moving some >>>>> points outside the clipping area (the math theory may say that such >>>>> plcement of a point outside the clipping area may be impossible, but >>>>> various mathematical simplifcations and shortcuts are used to simplify or >>>>> accelerate the rendering, at the price of some quirks. Even the SVG >>>>> standard (in constant evolution) could be affected as well in its >>>>> implementation. >>>>> >>>>> There are tons of possible bugs here. >>>>> >>>>> 2018-02-15 18:21 GMT+01:00 James Kass via Unicode <[email protected] >>>>> >: >>>>> >>>>>> This article: >>>>>> https://techcrunch.com/2018/02/15/iphone-text-bomb-ios-mac-c >>>>>> rash-apple/?ncid=mobilenavtrend >>>>>> >>>>>> The single Unicode symbol referred to in the article results from a >>>>>> string of Telugu characters. The article doesn't list or display the >>>>>> characters, so Mac users can visit the above link. A link in one of >>>>>> the comments leads to a page which does display the characters. >>>>>> >>>>> >>>>> >>>> >>> >> >

