Re: Problem/Question about UTF-16 characters

Andreas Lehmkühler Tue, 15 Oct 2013 11:49:34 -0700

Hi,

Am 13.10.13 22:33, schrieb Karcher, Glenn:

Hi,


I am having a problem when attempting to output a string containing Unicode 
characters.  If the Unicode sequence corresponds to single byte character 
(e.g., a Registered Trademark symbol, U+00AE), the character is output 
correctly.  However, if the character is a 2-byte value (e.g., Trademark 
character(TM), U+2122), the string is generated as UTF-16BE as expected, but 
the output file is drawn with the FE and FF BOM characters and the 21, 22 
characters as single byte characters.

Is there something that I need to initialize to properly handle the UTF-16 
characters (the most likely solution)?  Is it a bug in PDFBox?  Is it a quirk 
in Reader X (least likely since I have seen the TM character being displayed 
correctly in other documents)?

Any help and pointers on how to deal with this problem will be greatly 
appreciated.

PDFbox doesn't support utf encoded text yet, see [1] for further details.

SNIP

Best regards,
--Glenn Karcher



BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-922

Re: Problem/Question about UTF-16 characters

Reply via email to