Handling PDFs with missing version in header

Chris Bamford Thu, 10 Oct 2013 01:56:22 -0700

Hi there,

I am attempting text extraction with PDFBox 1.8.2.


For reasons I cannot explain, I am sometimes sent PDFs with no version number 
in the header, e.g.

%PDF-\r\n

instead of, say

%PDF-1.7\r\n

(I have checked, the version number does not appear in the next couple of 
lines, either.)

This causes PDFParser.parseHeader() to die as it attempts to perform a negative 
substring offset calculation.  My question is:
if I could detect this situation and default it to a really low version 
(%PDF-1.0 ?), would it be safe - or would other things break later on?

Thanks for any help.

- Chris

Handling PDFs with missing version in header

Reply via email to