Hi,

The best would be to create an issue with JIRA and upload the file there, if it isn't confidential.

Re "the latest", did you use an 1.8 version or a 2.0 version?

Tilman

Am 10.03.2014 21:19, schrieb Craig Strong:
I have been using PDFBox to extract text from several different PDF files fine. 
 I use the latest PDFBox app with ExtractText class.  There is one PDF that 
PDFBox (and iText) fails to extract any text even though I can extract the text 
with Adobe Reader and also pdftotext.exe part of XPdf.  I don't want to have to 
rely on using pdftotext.exe from a PC since this is part of an automated 
application.  I think the error relates to an unknown font type and having to 
use the few fonts installed in the jar file.  I tried running the API classes 
and trying to force a font from a certain location but I still got errors.  I 
thought I loaded the font with the loadTTF method but I don't know if that did 
anything with the font.  I would really like to have this working straight from 
the ExtractText class anyway.  I'm thinking I might have to build my own after 
putting a bunch of Windows fonts somewhere and changing a properties file but I 
really don't know
  if that is the right direction I should be taking and I am new to PDFBox.  
Any ideas?
Here are the errors I am getting.  I tried this from both a Windows PC and our 
system but I get the same errors.  The section starting processEncodedText and 
on repeats a few times so I just included the first entries.
Mar 10, 2014 3:50:44 PM org.apache.pdfbox.pdmodel.font.PDFontFactory createFont
WARNING: Substituting TrueType for unknown font subtype=
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.lang.NullPointerException
Throwable occurred: java.lang.NullPointerException
         at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.loadDescriptorDictionary(PDTrueTypeFont.java:375)
         at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.ensureFontDescriptor(PDTrueTypeFont.java:221)
         at 
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.<init>(PDTrueTypeFont.java:119)
         at 
org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:121)
         at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:204)
         at 
org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:604)
         at 
org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
         at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
         at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
         at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
         at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine 
processEncodedText
WARNING: java.lang.NullPointerException
Throwable occurred: java.lang.NullPointerException
         at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:355)
         at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
         at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
         at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
         at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
         at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)
Mar 10, 2014 3:50:45 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.lang.NullPointerException
Throwable occurred: java.lang.NullPointerException
         at 
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:364)
         at org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:554)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:268)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235)
         at 
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456)
         at 
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381)
         at 
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340)
         at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:275)
         at org.apache.pdfbox.ExtractText.main(ExtractText.java:85)
         at org.apache.pdfbox.PDFBox.main(PDFBox.java:58)

Thanks,
Craig Strong


Reply via email to