Am 29.07.2015 um 03:34 schrieb 牛小伟:
can you give me the java code you process it successful? very thanks.
Hello 牛小伟,
I just processed your file with the ExtractText command utility.
But now I also tried some code, and this works:
PDDocument document = PDDocument.load(new
File(pdfFilename), "");
PDFTextStripper stripper = new PDFTextStripper();
stripper.setSortByPosition(true);
System.out.println(stripper.getText(document));
and here's the output I get:
29.07.2015 08:01:32.479 WARN [main]
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:318 - New fonts
found, font cache will be re-built
29.07.2015 08:01:32.485 WARN [main]
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:223 - Building
font cache, this may take a while
29.07.2015 08:01:33.125 WARN [main]
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:470 - Missing
'name' entry for PostScript name in font C:\Windows\FONTS\Digit.TTF
29.07.2015 08:01:34.488 WARN [main]
org.apache.pdfbox.pdmodel.font.FileSystemFontProvider:278 - Finished
building font cache, found 404 fonts
29.07.2015 08:01:34.519 WARN [main]
org.apache.pdfbox.pdmodel.font.PDCIDFontType0:141 - Using fallback
ArialUnicodeMS for CID-keyed font HeiseiKakuGo-W5
現代・起亜自動車、ハイブリッド車の世界販売台数で3位に返り咲き―韓国メディ
アItext!
If your code doesn't find the resource UniJIS-UCS2-HW-H, then there's
something wrong with your build / your configuration. "UniJIS-UCS2-HW-H"
is here:
org\apache\fontbox\cmap\UniJIS-UCS2-HW-H
open your jar file with Winzip or 7zip to look at it.
Btw, that directory has 97 entries.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]