Tilman, The problem seems to be that version of Acrobat Pro. It ignores any protection bits. When I tried it with another version of acrobat both standard and pro everything was correct. Thank you for your time.
Bill Heckle Programmer TCEQ Information Resources Division [email protected] 512.239.0874 -----Original Message----- From: Tilman Hausherr [mailto:[email protected]] Sent: Tuesday, July 28, 2015 12:02 PM To: [email protected] Subject: Re: unijis-ucs2-hw-h problems Hello 牛小伟, Did you use pdfbox and fontbox of the same version? I.e. are you sure that there isn't an old file in your class path? If yes: - What is the smallest possible code that reproduces the problem, and does it happen with the file you posted yesterday? (If it is a different file, please upload it somewhere) - Does the ExtractText command line feature work on your file or is there also an error? (run java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText <nameofpdf> ) Tilman Am 28.07.2015 um 15:16 schrieb 牛小伟: > Dear Tilman, > I find the problem before. > Now got this error,Please help,thanks: > 七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init> > 警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. > 七月 28, 2015 9:11:07 下午 > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider loadCache > 警告: New fonts found, font cache will be re-built > 七月 28, 2015 9:11:07 下午 > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider <init> > 警告: Building font cache, this may take a while > 七月 28, 2015 9:11:08 下午 > org.apache.pdfbox.pdmodel.font.FileSystemFontProvider saveCache > 警告: Finished building font cache, found 543 fonts > 七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 > <init> > 警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5 > java.io.IOException: Error: Could not find referenced cmap stream > UniJIS-UCS2-HW-H at > org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413 > ) at > org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85) > at > org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManag > er.java:54) at > org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.ja > va:161) at > org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109 > ) at > org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory. > java:83) at > org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121) > at > org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(S > etFontAndSize.java:50) at > org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStr > eamEngine.java:794) at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators > (PDFStreamEngine.java:460) at > org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStrea > mEngine.java:437) at > org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamE > ngine.java:148) at > org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEn > gine.java:117) at > org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.jav > a:367) at > org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.ja > va:303) at > org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java: > 248) at > org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:20 > 9) at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16) > at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source) > > > > > > > > > > > 在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道: >> Dear Tilman, >> Thanks.Then do you know when will the 2.0 version be released? >> >> Best regards >> Niu Xiaowei >> >> -- >> 发自我的网易邮箱手机智能版 >> >> >> 在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道: >>> Dear Tilman, >>> Thanks for your support.The original file is in the company. >>> I can't get it. But I made a simple one using Itext. >>> They are in the same encoding.The pdfBox can't process it either. >>> Please check the attachment. >>> >>> >>> Thanks, >>> Best Regards, >>> Niu X >>> >>> >>> >>> >>> >>> >>> >>> >>> At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote: >>>> Dear team: >>>> We are using your product pdfbox 1.6 to do text extraction. >>>> But when we are processing the encoding(UniJIS-UCS2-HW-H), it >>>> appears unreadable code like this(????????????????????????3?????????????). >>>> We have tried some other ways to process it. But they don't work. >>>> We also have some doc with the encoding(GBK-EUC-H),the pdfbox can >>>> work perfectly. I also tried the pdfbox 1.8, it also didn't work. >>>> I checked the charset of the pdfbox. It contains both of the encoding. >>>> I don't know why one is working, another is not working. >>>> Hope your support for this .Very thanks. >>>> >>>> >>>> Best Regard. >>>> >>>> >>>> the docsnapshot of the encoding: >>>> >>>> >>> >>> >>> >>> >>> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

