2011/8/24 John Hudson <j...@tiro.ca>: > Philippe Verdy wrote: > >> Rereading closely the OpenType spec... > > I suggest you read also the script-specific OT layout specifications. > > http://www.microsoft.com/typography/SpecificationsOverview.mspx > > You'll note, for example, that the Arabic font spec doesn't even mention > BiDi, because it is assumed that this has been resolved before glyph runs > for OTL processing are even identified. This makes sense to me because BiDi > is a character-centric operation. > > The Microsoft font specs describe what Uniscribe (and DWrite) do with text > and fonts for particular scripts, and there may be some differences in other > implementations. For example, Uniscribe performs s invalid mark sequence > checks that others, preferring to see this as a task for spellcheckers, do > not. But the glyph selection and positioning results should be the same > across implementations. Font makers need to know how text is processed and > OTL features applied in order to make fonts that work with resulting glyph > runs and input strings. Changing the point in the glyph string resolution > when BiDi is applied breaks everything. It's a complete non-starter.
I had already read this subspecs. And I think you're wrong, because the list of glyphs is in resolved order, even after all ligature substitution, glyph breaking (for Indic scripts) has a completely independant order from the logical reading of characters. You can perfectly run the BiDi algorithm after the glyph substitutions. All what the Bidi algorithm is to delimit runs of characters that are to be rendered in one direction or the other. The same limits will also be boundaries across the associated runs of glyph ids. There's in fact absolutely no need of the Bidi algorithm to process all glyph substitutions, because they will be performed exactly the same way. The two algorithms are in fact completely independant of each other, at least if you don't need to apply substitutions that span distinct runs. However there's a dependancy between the BiDi algorithm and the glyph positioning, because each RTL or LTR run needs to have its own left-side bearing, and its own right side bearing, in order to mutually space these runs correctly. IT also influences the direction by which you'll advance the coordinates along the baseline for positioning the fully resolved glyph ids. This requires then to know the principal direction of each run of glyph ids. In fact you have absolutely not demonstrated anything that this concept would even break anything, except ligatures between RTL and LTR characters, i.e. between resolved RTL and LTR glyphs, something that can only occur over the a boundary between a resolved RTL run of glyph ids, and a resolved LTR run run of glyphs ids. But I was said that OpenType layout does not support such thing, or that this possible behavior is for now undocumented in OpenType specs, but this is not the case of AAT layout and Graphite layout, but I admit that this would cause problems on how to position such ligature glyphs that would have an ambiguous direction, because it would then belong to two successive directional runs at the character level). As the above paragraph may not be very clear to understand, let's suppose that you wanted to create a GSUB ligature between ARABIC LAM (resolved to RTL at the character level) and LATIN CAPITAL LETTER A (resolved to LTR at the character level, in the Bidi algorithm). You would cmap this ligature to a "LAM_A" glyph id. Technically, nothing in OpenType GSUB's forbids you do to that in your font. But the OpenType engine that needs to maintain an equivalence of boundaries between runs of characters (from Bidi) and runs of glyph ids (from the cmap, then after GSUB substitutions) will not know if the LAM_A glyph belongs to the first run (terminated by the RTL character LAM) or the second run (starting by the LTR character A) without providing *with each* GSUB rule an indication of where to place the new direction boundary if there was a direction boundary in the middle of the source list of glyphs, before its substitution. Yes this is a very borderline case, because I have never seen it or needed it in practice. Unicode prefers reencoding a new similar character with the opposite strong direction (for example the HEBREW ALEF SYMBOL for maths, which is very similar to the Hebrew letter but has a opposite direction ; but here I wonder how it would create a ligature with another strong LTR character that is also not a diacritic, even if there's an evidence that such pair can be GPOS'itionned, i.e. kerned). What is only assumed is that GSUB will preserve the boundaries between runs of characters that are in the same direction; but of course it does not always preserve the boundaries between the logical character clusters. This may explain your concern that this could potentially break something, but only if you don't care about preserving unambiguously the boundaries between directional runs, and you have no data hint in the subtitution rules about where the reposition the boundary after the substitution occured.