On 8/10/15 16:29, Bruno Le Floch wrote:
Here's a shorter example which hyphenates with cmr12 (in pdfTeX/XeTeX)
but not with the font copied from David's example: hyphenation is lost
when closing the hbox, as can be seen by adding the appropriate
\tracingonline=1\showboxbreadth=99 and \showlists just before the
closing brace.
I have no idea why hyphenation is lost, though. As far as I can tell,
in David's example hyphenation is lost once the text is broken into
line a first time when put into a \vbox, while with cmr12 hyphenation
is kept through further unboxings.
--
\ifx\XeTeXversion\undefined
\font\x=cmr12
\else
\font\x="[lmroman12-regular]:mapping=tex-text"
\fi
\x
\setbox0\hbox{XXXXXXXXXXXXX just a few normal words to fill up the line
up to my x x x zzzzzz\-zzzzz}
\unhbox0
\bye
OK, I think I see what's happening here. When xetex finishes building an
\hbox, it will drop any discretionaries that occur directly between two
adjacent runs of characters that use the same OpenType font, and merge
the preceding and following runs into a single node.
It does this so that OpenType shaping features (ligatures, kerning, or
more advanced contextual features...) will apply correctly across the
whole word, rather than being broken at the (presumed unused)
discretionary break.
The trouble here is that when the \hbox is subsequently unboxed, it
can't reintroduce the discretionary that was discarded. So when the text
from the \hbox is then used in forming a new paragraph, it just gets
automatic hyphenation applied.
I suppose to fix this, we'll need to keep track of discretionaries that
were "elided" from native_word nodes, rather than just discarding them
completely.
A possible workaround would be to define \- such that it always breaks
xetex's native_word nodes; something like this might work:
\def\-{\leavevmode \kern0pt \discretionary{-}{}{}}
This means that explicit discretionary hyphens will interfere with
ligatures and kerning (etc), but OTOH they already do that in standard
TeX, AFAICT:
\font\x = cmr12
\x AV office \par % with kerning and ligatures
\x A\-V of\-fice \par % no AV kern, only the "fi" ligature
\end
In comparison, with XeTeX (and without the extra \def suggested above):
\font\x = "[lmroman12-regular]"
\x AV office \par % with kerning and ligatures
\x A\-V of\-fice \par % typesets identically
\end
This is generally considered a feature rather than a bug.
But the loss of explicit discretionaries when hboxing and then unhboxing
text is clearly a problem that we should figure out how to fix.
JK
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex