Dear Tilman,
I have checkout the code.But when I create a PDFTextStripper.
PDFTextStripper pts = new PDFTextStripper();
It got errors below don't know why:
Exception in thread "main" java.lang.ExceptionInInitializerError
at 
org.apache.pdfbox.text.PDFTextStreamEngine.<init>(PDFTextStreamEngine.java:103)
at org.apache.pdfbox.text.PDFTextStripper.<init>(PDFTextStripper.java:194)
at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:15)
at com.niu.pdf.demo.PDFBoxDemo.main(PDFBoxDemo.java:26)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Unknown Source)
at java.io.InputStreamReader.<init>(Unknown Source)
at 
org.apache.pdfbox.pdmodel.font.encoding.GlyphList.loadList(GlyphList.java:130)
at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.<init>(GlyphList.java:111)
at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.load(GlyphList.java:52)
at org.apache.pdfbox.pdmodel.font.encoding.GlyphList.<clinit>(GlyphList.java:38)
... 4 more


Best Regards,
Niuxiaowei






在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道:
>Dear Tilman,
>     Thanks.Then do you know when will the 2.0 version be released?
>
>Best regards
>Niu Xiaowei
>
>--
>发自我的网易邮箱手机智能版
>
>
>在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道:
>>Dear Tilman,
>>Thanks for your support.The original file is in the company.
>>I can't get it. But I made a simple one using Itext.
>>They are in the same encoding.The pdfBox can't  process it either.
>>Please check the attachment.
>>
>>
>>Thanks,
>>Best Regards,
>>Niu X
>>
>>
>>
>>
>>
>>
>>
>>
>>At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote:
>>>Dear team:
>>>         We are using your product pdfbox 1.6 to do text extraction. 
>>>But when we are processing the encoding(UniJIS-UCS2-HW-H), 
>>>it appears unreadable code like this(????????????????????????3?????????????).
>>>We have tried some other ways to process it. But they don't work.
>>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox
>>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work.
>>>I checked the charset of the pdfbox. It contains both of the encoding.
>>>I don't know why one is working, another is not working.
>>>Hope your support for this .Very thanks.
>>>
>>>
>>>Best Regard.
>>>
>>>
>>>the docsnapshot of the encoding:
>>>
>>>
>>
>>
>>
>>
>>
>>

Reply via email to