"Simon Cozens" <si...@simon-cozens.org>: > This is possibly a daft question, but... > > In traditional TeX, character tokens are processed and put into boxes > individually, with fairly primitive ligature tables. Obviously XeTeX > doesn't > do this, using Harfbuzz (or ICU or whatever) to do the shaping and > layout. > > My question is, if you're not "showing" individual characters to the > shaping > engine for it to consider, what defines how big a string of > characters to > shape at a time? Does XeTeX break at the "word" level and then shape > a word, > and if so what defines a word? (Chinese has no word breaks!) Or does > it shape > an entire paragraph of text at a time (!) and then box up the glyphs > individually? Or...? > > (I've tried starting at layoutChars in XeTeXLayoutInterface.cpp and > working > backwards but I can't understand where I end up: measure_native_node > shapes a > node, but what's a node?)
I don't know how Harfbuzz and/or ICU work exactly, but: - Characters are never put into individual boxes; - Whatever shaping must take place is defined by sequences of characters; so you look at each character, see if it must be processed (possibly as a part of a larger sequence), move to the next character (unless it has been processed as part of a sequence), and so on. Most of the rules you must follow to process glyphs are explained here: http://www.microsoft.com/typography/otspec/ So your question (as I understand it) is really about processing OT fonts. The sequence of characters I have mentionned (your string of characters) are defined in the font itself (for complex sequencess, see e.g. contextual lookups). As for a node, it is whatever TeX processes internally to build a page: it can be a character, a kern, a whatsit, a box... Best, Paul -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex