NullPointerException when font encoding is null

W Bosma Mon, 11 Apr 2011 09:38:07 -0700

Hi,

I'm new to PDF and PDFBox, but I'm trying to see if I can use it to extract
text + positions from PDF.


I ran into a NullPointerException which seems to be caused by the fact
that COSDictionary.getDictionaryObject(COSName.ENCODING) returns null. This
happens with some PDFs.

This is what I did. I created a fairly simple application to start with:

PDFTextStripper printer = new PDFTextStripper();
printer.writeText(document, new OutputStreamWriter(System.out));

This gives me the NullPointerException:

java.lang.NullPointerException
        at
org.apache.pdfbox.pdmodel.font.PDSimpleFont.getFontHeight(PDSimpleFont.java:136)
        at
org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:408)
        at
org.apache.pdfbox.util.operator.ShowText.process(ShowText.java:45)
        at
org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:551)
        at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:274)
        at
org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251)
        at
org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:225)
        at
org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:442)
        at
org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:366)
        at
org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:322)
        at TestRun.main(TestRun.java:79)

Line 136 in PDSimpleFont.java refers to the 'encoding' property, which seems
to be null because PDFont.getEncoding() returns null. I added some debug
lines to the PDFBox code, and it appears
that COSDictionary.getDictionaryObject(COSName.ENCODING), which is called
from PDFont.getEncoding(), occasionally returns null. This causes the
NullPointerException above.

Any clues? Can I fix this?

Wouter

NullPointerException when font encoding is null

Reply via email to