Dear Tilman, Thanks for your support.The original file is in the company. I can't get it. But I made a simple one using Itext. They are in the same encoding.The pdfBox can't process it either. Please check the attachment.
Thanks, Best Regards, Niu X At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >Dear team: > We are using your product pdfbox 1.6 to do text extraction. >But when we are processing the encoding(UniJIS-UCS2-HW-H), >it appears unreadable code like this(????????????????????????3?????????????). >We have tried some other ways to process it. But they don't work. >We also have some doc with the encoding(GBK-EUC-H),the pdfbox >can work perfectly. I also tried the pdfbox 1.8, it also didn't work. >I checked the charset of the pdfbox. It contains both of the encoding. >I don't know why one is working, another is not working. >Hope your support for this .Very thanks. > > >Best Regard. > > >the docsnapshot of the encoding: > >
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

