On 8/10/15 16:29, Bruno Le Floch wrote:
Here's a shorter example which hyphenates with cmr12 (in pdfTeX/XeTeX)
but not with the font copied from David's example: hyphenation is lost
when closing the hbox, as can be seen by adding the appropriate
\tracingonline=1\showboxbreadth=99 and \showlists just before the
closing brace.

I have no idea why hyphenation is lost, though.  As far as I can tell,
in David's example hyphenation is lost once the text is broken into
line a first time when put into a \vbox, while with cmr12 hyphenation
is kept through further unboxings.

--


\ifx\XeTeXversion\undefined
\font\x=cmr12
\else
\font\x="[lmroman12-regular]:mapping=tex-text"
\fi

\x

\setbox0\hbox{XXXXXXXXXXXXX just a few normal words to fill up the line
   up to my x x x zzzzzz\-zzzzz}

\unhbox0

\bye


OK, I think I see what's happening here. When xetex finishes building an \hbox, it will drop any discretionaries that occur directly between two adjacent runs of characters that use the same OpenType font, and merge the preceding and following runs into a single node.

It does this so that OpenType shaping features (ligatures, kerning, or more advanced contextual features...) will apply correctly across the whole word, rather than being broken at the (presumed unused) discretionary break.

The trouble here is that when the \hbox is subsequently unboxed, it can't reintroduce the discretionary that was discarded. So when the text from the \hbox is then used in forming a new paragraph, it just gets automatic hyphenation applied.

I suppose to fix this, we'll need to keep track of discretionaries that were "elided" from native_word nodes, rather than just discarding them completely.

A possible workaround would be to define \- such that it always breaks xetex's native_word nodes; something like this might work:

  \def\-{\leavevmode \kern0pt \discretionary{-}{}{}}

This means that explicit discretionary hyphens will interfere with ligatures and kerning (etc), but OTOH they already do that in standard TeX, AFAICT:

  \font\x = cmr12
  \x AV office \par      % with kerning and ligatures
  \x A\-V of\-fice \par  % no AV kern, only the "fi" ligature
  \end

In comparison, with XeTeX (and without the extra \def suggested above):

  \font\x = "[lmroman12-regular]"
  \x AV office \par      % with kerning and ligatures
  \x A\-V of\-fice \par  % typesets identically
  \end

This is generally considered a feature rather than a bug.

But the loss of explicit discretionaries when hboxing and then unhboxing text is clearly a problem that we should figure out how to fix.

JK



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to