On 5 Nov 2011, at 10:24, Akira Kakuto wrote:

> Dear Heiko,
> 
>>>>>> Conclusion:
>>>>>> * The encoding mess with 8-bit characters remain even with XeTeX.
> 
> I have disabled to reencode pdf strings to UTF-16 in xdvipdfmx: TL trunk 
> r24508.
> Now
> /D<c3a46e6368c3b872>
> and
> /Names[<c3a46e6368c3b872>7 0 R]
> 
> Thanks,
> Akira

Unfortunately, I have not had time to follow this thread in detail or 
investigate the issue properly, but I'm concerned this may break other things 
that currently work, and rely on this conversion between the encoding form in 
\specials, and the representation needed in PDF.

However, by way of background: xetex was never intended to be a tool for 
reading and writing arbitrary binary files. It is a tool for processing text, 
and is specifically based on Unicode as the encoding for text, with UTF-8 being 
its default/preferred encoding form for Unicode, and (more importantly) the 
ONLY encoding form that it uses to write output files. It's possible to READ 
other encoding forms (UTF-16), or even other codepages, and have them mapped to 
Unicode internally, but output is always written as UTF-8.

Now, this should include not only .log file and \write output, but also text 
embedded in the .xdv output using \special. Remember that \special basically 
writes a sequence of *characters* to the output, and in xetex those characters 
are *Unicode* characters. So my expectation would be that arbitrary Unicode 
text can be written using \special, and will be represented using UTF-8 in the 
argument of the xxxN operation in .xdv. If that \special is destined to be 
converted to a fragment of PDF data by the xdv-to-pdf output driver 
(xdvipdfmx), and needs a different encoding form, I'd expect the driver to be 
responsible for that conversion.

What I would NOT expect to work is for a TeX macro package to generate 
arbitrary binary data (byte streams) and expect these to be passed unchanged to 
the output. I suspect that's what Heiko's macros probably do, and it worked in 
pdftex where "tex character" == "byte", but it's problematic when "tex 
character" == "Unicode character".

JK




--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
  http://tug.org/mailman/listinfo/xetex

Reply via email to