Hi,
Am 13.10.13 22:33, schrieb Karcher, Glenn:
Hi,
I am having a problem when attempting to output a string containing Unicode
characters. If the Unicode sequence corresponds to single byte character
(e.g., a Registered Trademark symbol, U+00AE), the character is output
correctly. However, if the character is a 2-byte value (e.g., Trademark
character(TM), U+2122), the string is generated as UTF-16BE as expected, but
the output file is drawn with the FE and FF BOM characters and the 21, 22
characters as single byte characters.
Is there something that I need to initialize to properly handle the UTF-16
characters (the most likely solution)? Is it a bug in PDFBox? Is it a quirk
in Reader X (least likely since I have seen the TM character being displayed
correctly in other documents)?
Any help and pointers on how to deal with this problem will be greatly
appreciated.
PDFbox doesn't support utf encoded text yet, see [1] for further details.
SNIP
Best regards,
--Glenn Karcher
BR
Andreas Lehmkühler
[1] https://issues.apache.org/jira/browse/PDFBOX-922