Dear Tilman, I find the problem before. Now got this error,Please help,thanks: 七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init> 警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. 七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadCache 警告: New fonts found, font cache will be re-built 七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init> 警告: Building font cache, this may take a while 七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider saveCache 警告: Finished building font cache, found 543 fonts 七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 <init> 警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5 java.io.IOException: Error: Could not find referenced cmap stream UniJIS-UCS2-HW-H at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413) at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85) at org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54) at org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:161) at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109) at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83) at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121) at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50) at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:794) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:460) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148) at org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248) at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209) at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16) at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source)
在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道: >Dear Tilman, > Thanks.Then do you know when will the 2.0 version be released? > >Best regards >Niu Xiaowei > >-- >发自我的网易邮箱手机智能版 > > >在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道: >>Dear Tilman, >>Thanks for your support.The original file is in the company. >>I can't get it. But I made a simple one using Itext. >>They are in the same encoding.The pdfBox can't process it either. >>Please check the attachment. >> >> >>Thanks, >>Best Regards, >>Niu X >> >> >> >> >> >> >> >> >>At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >>>Dear team: >>> We are using your product pdfbox 1.6 to do text extraction. >>>But when we are processing the encoding(UniJIS-UCS2-HW-H), >>>it appears unreadable code like this(????????????????????????3?????????????). >>>We have tried some other ways to process it. But they don't work. >>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox >>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work. >>>I checked the charset of the pdfbox. It contains both of the encoding. >>>I don't know why one is working, another is not working. >>>Hope your support for this .Very thanks. >>> >>> >>>Best Regard. >>> >>> >>>the docsnapshot of the encoding: >>> >>> >> >> >> >> >> >>

