On Mon, May 17, 2010 at 13:01, Stephan Hennig wrote: > Am 17.05.2010 00:55, schrieb Mojca Miklavec: > >>> From a readability point of view 'lava-bo' is better for me since one >>> can >>> guess the rest of the word (whereas you can't guess the rest of la-) >> >> <not-to-be-taken-seriously> >> Oh, and yes ... I was already wondering when somebody will come up >> with the idea to extend TeX with tolerances for preferable breaking >> points in addition to the allowed ones :) :) :) >> </not-to-be-taken-seriously> > > Incidentally, I've had a mail conversation about this with Taco and Werner a > couple of weeks ago. The good news is, I think Taco has this on his list. > Here's a sketch of the approach as I understand it (ignoring libhnj for > now). > > Hyphenation points can be weighted by applying multiple pattern sets in > parallel that have different weights attached. That is, if a match exists > in, e.g., a compound word pattern set, then that hyphenation point will be > weighted higher than a regular hyphenation point. If concurring pattern > sets find a match, the highest weight wins. > > Consider these pattern sets > > * regular pattern set with an attached weight of 10: > > n1n a1d > > * compound word pattern set with an attached weight of 20: > > en1nad > > and the compound word "Tannennadel" (fir needle). The regular pattern set > has matches > > Tan-nen-na-del > > weighting each hyphenation point equally (10 or whatever). Compound word > patterns find the match > > Tannen-nadel > > weighting that match 20. Finally, during paragraph breaking, hyphenation > weights will be > > Tan-nen-na-del > 10 20 10 > > Therefore breaking the word at the word compound Tannen-nadel will be > (slightly) preferred.
Thanks for the really nice outline. I didn't mean it too seriously, but now I'll have a "problem" that I'll have to find a list of preferred hyphenation points somewhere, while I don't even understand our exact rules :) It seems that at some point we'll have to start splitting luatex-specific patterns (with advanced features) from regular ones (which might already be the case - I have a feeling that Hungarian might have an improved set of patterns that could be used in luatex and only in luatex). Mojca
