Dear Tilman, I got it,thanks for your support.
Best Regards, Niuxiaowei 在 2015-07-29 09:34:43,"牛小伟" <[email protected]> 写道: >Dear Tilman, >can you give me the java code you process it successful? very thanks. > > > > >-- >发自我的网易邮箱手机智能版 > > >在 2015-07-28 21:16:16,"牛小伟" <[email protected]> 写道: >>Dear Tilman, >>I find the problem before. >>Now got this error,Please help,thanks: >>七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init> >>警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root >>0x80000002. Windows RegCreateKeyEx(...) returned error code 5. >>七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider >>loadCache >>警告: New fonts found, font cache will be re-built >>七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider >><init> >>警告: Building font cache, this may take a while >>七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider >>saveCache >>警告: Finished building font cache, found 543 fonts >>七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 <init> >>警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5 >>java.io.IOException: Error: Could not find referenced cmap stream >>UniJIS-UCS2-HW-H >>at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413) >>at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85) >>at >>org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54) >>at >>org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:161) >>at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109) >>at >>org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83) >>at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121) >>at >>org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50) >>at >>org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:794) >>at >>org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:460) >>at >>org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437) >>at >>org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148) >>at >>org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117) >>at >>org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367) >>at >>org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303) >>at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248) >>at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209) >>at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16) >>at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source) >> >> >> >> >> >> >> >> >> >> >>在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道: >>>Dear Tilman, >>> Thanks.Then do you know when will the 2.0 version be released? >>> >>>Best regards >>>Niu Xiaowei >>> >>>-- >>>发自我的网易邮箱手机智能版 >>> >>> >>>在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道: >>>>Dear Tilman, >>>>Thanks for your support.The original file is in the company. >>>>I can't get it. But I made a simple one using Itext. >>>>They are in the same encoding.The pdfBox can't process it either. >>>>Please check the attachment. >>>> >>>> >>>>Thanks, >>>>Best Regards, >>>>Niu X >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>>At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >>>>>Dear team: >>>>> We are using your product pdfbox 1.6 to do text extraction. >>>>>But when we are processing the encoding(UniJIS-UCS2-HW-H), >>>>>it appears unreadable code like >>>>>this(????????????????????????3?????????????). >>>>>We have tried some other ways to process it. But they don't work. >>>>>We also have some doc with the encoding(GBK-EUC-H),the pdfbox >>>>>can work perfectly. I also tried the pdfbox 1.8, it also didn't work. >>>>>I checked the charset of the pdfbox. It contains both of the encoding. >>>>>I don't know why one is working, another is not working. >>>>>Hope your support for this .Very thanks. >>>>> >>>>> >>>>>Best Regard. >>>>> >>>>> >>>>>the docsnapshot of the encoding: >>>>> >>>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> >>

