Dear Tilman, I have checkout the code.But when I create a PDFTextStripper. PDFTextStripper pts = new PDFTextStripper(); It got errors below don't know why: Exception in thread "main" java.lang.ExceptionInInitializerError at org.apache.pdfbox.text.PDFTextStreamEngine.<init>(PDFTextStreamEngine.java:103) at org.apache.pdfbox.text.PDFTextStripper.<init>(PDFTextStripper.java:194) at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:15) at com.niu.pdf.demo.PDFBoxDemo.main(PDFBoxDemo.java:26) Caused by: java.lang.NullPointerException at java.io.Reader.<init>(Unknown Source) at java.io.InputStreamReader.<init>(Unknown Source) at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.loadList(GlyphList.java:130) at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.<init>(GlyphList.java:111) at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.load(GlyphList.java:52) at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.<clinit>(GlyphList.java:38) ... 4 more
Best Regards, Niuxiaowei 在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道: >Dear Tilman, > Thanks.Then do you know when will the 2.0 version be released? > >Best regards >Niu Xiaowei > >-- >发自我的网易邮箱手机智能版 > > >在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道: >>Dear Tilman, >>Thanks for your support.The original file is in the company. >>I can't get it. But I made a simple one using Itext. >>They are in the same encoding.The pdfBox can't process it either. >>Please check the attachment. >> >> >>Thanks, >>Best Regards, >>Niu X >> >> >> >> >> >> >> >> >>At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >>>Dear team: >>> We are using your product pdfbox 1.6 to do text extraction. >>>But when we are processing the encoding(UniJIS-UCS2-HW-H), >>>it appears unreadable code like this(????????????????????????3?????????????). >>>We have tried some other ways to process it. But they don't work. >>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox >>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work. >>>I checked the charset of the pdfbox. It contains both of the encoding. >>>I don't know why one is working, another is not working. >>>Hope your support for this .Very thanks. >>> >>> >>>Best Regard. >>> >>> >>>the docsnapshot of the encoding: >>> >>> >> >> >> >> >> >>

