Hello 牛小伟,

Did you use pdfbox and fontbox of the same version? I.e. are you sure that there isn't an old file in your class path?

If yes:

- What is the smallest possible code that reproduces the problem, and does it happen with the file you posted yesterday? (If it is a different file, please upload it somewhere) - Does the ExtractText command line feature work on your file or is there also an error? (run java -jar pdfbox-app-2.0.0-SNAPSHOT.jar ExtractText <nameofpdf> )

Tilman


Am 28.07.2015 um 15:16 schrieb 牛小伟:
Dear Tilman,
I find the problem before.
Now got this error,Please help,thanks:
七月 28, 2015 9:11:07 下午 java.util.prefs.WindowsPreferences <init>
警告: Could not open/create prefs root node Software\JavaSoft\Prefs at root 
0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider 
loadCache
警告: New fonts found, font cache will be re-built
七月 28, 2015 9:11:07 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider 
<init>
警告: Building font cache, this may take a while
七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.FileSystemFontProvider 
saveCache
警告: Finished building font cache, found 543 fonts
七月 28, 2015 9:11:08 下午 org.apache.pdfbox.pdmodel.font.PDCIDFontType0 <init>
警告: Using fallback ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5
java.io.IOException: Error: Could not find referenced cmap stream 
UniJIS-UCS2-HW-H
at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:413)
at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:85)
at 
org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:54)
at org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:161)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:109)
at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:83)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:121)
at 
org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:50)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:794)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:460)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:437)
at 
org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:148)
at 
org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:117)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:367)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:303)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:248)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:209)
at com.niu.pdf.demo.PDFBoxDemo.getText(PDFBoxDemo.java:16)
at com.niu.pdf.demo.PDFBoxDemo.null(Unknown Source)










在 2015-07-27 09:13:14,"牛小伟" <[email protected]> 写道:
Dear Tilman,
     Thanks.Then do you know when will the 2.0 version be released?

Best regards
Niu Xiaowei

--
发自我的网易邮箱手机智能版


在 2015-07-26 22:07:29,"牛小伟" <[email protected]> 写道:
Dear Tilman,
Thanks for your support.The original file is in the company.
I can't get it. But I made a simple one using Itext.
They are in the same encoding.The pdfBox can't  process it either.
Please check the attachment.


Thanks,
Best Regards,
Niu X








At 2015-07-25 15:42:55, "牛小伟" <[email protected]> wrote:
Dear team:
         We are using your product pdfbox 1.6 to do text extraction.
But when we are processing the encoding(UniJIS-UCS2-HW-H),
it appears unreadable code like this(????????????????????????3?????????????).
We have tried some other ways to process it. But they don't work.
We also have some doc with the encoding(GBK-EUC-H),the pdfbox
can work perfectly. I also tried the pdfbox 1.8, it also didn't work.
I checked the charset of the pdfbox. It contains both of the encoding.
I don't know why one is working, another is not working.
Hope your support for this .Very thanks.


Best Regard.


the docsnapshot of the encoding:









---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to