Re: unijis-ucs2-hw-h problems

Tilman Hausherr Sun, 26 Jul 2015 08:21:06 -0700

Hello 牛小伟,

PDF attachments don't go through in this mailing list, but yourattachment landed in moderation so I have it, and could test it :-)


- it does indeed not work with 1.8
- it does work with the unreleased 2.0 version. With PDFBox, I get

"現代・起亜自動車、ハイブリッド車の世界販売台数で3位に返り咲き―韓国メディアItext! "


with Adobe Reader, I get

"現代・起亜自動車、ハイブリッド車の世界販売台数で3位に返り咲き―韓国メディアItext!"


which seems to be the same.

You can try to use the 2.0 version, see
https://pdfbox.apache.org/download.cgi#scm
https://pdfbox.apache.org/2.0/getting-started.html

Although the API has changed, the part about text extraction hasn't, sogive it a try. I wasn't involved at 1.6 times. Most important for you isthat loading a document is like this:


PDDocument doc = PDDocument.load(....);


So give it a try. Ask if again you encounter an problems. Good luck!

Tilman

Am 26.07.2015 um 16:07 schrieb 牛小伟:

Dear Tilman,
Thanks for your support.The original file is in the company.
I can't get it. But I made a simple one using Itext.
They are in the same encoding.The pdfBox can't  process it either.
Please check the attachment.

Thanks,
Best Regards,
Niu X





At 2015-07-25 15:42:55, "牛小伟" <[email protected] <mailto:[email protected]>> wrote:
>Dear team:
>         We are using your product pdfbox 1.6 to do text extraction.
>But when we are processing the encoding(UniJIS-UCS2-HW-H),
>it appears unreadable code like this(????????????????????????3?????????????).
>We have tried some other ways to process it. But they don't work.
>We also have some doc with the encoding(GBK-EUC-H),the pdfbox
>can work perfectly. I also tried the pdfbox 1.8, it also didn't work.
>I checked the charset of the pdfbox. It contains both of the encoding.
>I don't know why one is working, another is not working.
>Hope your support for this .Very thanks.
>
>
>Best Regard.
>
>
>the docsnapshot of the encoding:
>
>






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: unijis-ucs2-hw-h problems

Reply via email to