[ https://issues.apache.org/jira/browse/PDFBOX-1241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218985#comment-13218985 ]
Maruan Sahyoun commented on PDFBOX-1241: ---------------------------------------- For the files you came across is only the %%EOF missing or also the offset. From my perspective these are issues which need to be dealt with separately. > Better handle of missing EOF at the end of a file > ------------------------------------------------- > > Key: PDFBOX-1241 > URL: https://issues.apache.org/jira/browse/PDFBOX-1241 > Project: PDFBox > Issue Type: Improvement > Components: Parsing, Text extraction > Affects Versions: 1.6.0 > Environment: All platforms affected > Reporter: Ernst Eibensteiner > Attachments: On the Insert tab.pdf > > > We came across PDF files that do not have a %%EOF at the end of the file. > This leads to the following exeption: > c:\tmp> java -jar pdfbox-app-1.6.0.jar ExtractText -endPage 1 "On the Insert > tab.pdf" > ExtractText failed with the following exception: > java.io.IOException: Error: Expected an integer type, actual='' > at > org.apache.pdfbox.pdfparser.BaseParser.readInt(BaseParser.java:1384) > at > org.apache.pdfbox.pdfparser.PDFParser.parseStartXref(PDFParser.java:6 > 63) > at > org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:464) > at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053) > at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:978) > at org.apache.pdfbox.ExtractText.startExtraction(ExtractText.java:196) > at org.apache.pdfbox.ExtractText.main(ExtractText.java:76) > at org.apache.pdfbox.PDFBox.main(PDFBox.java:42) > While these PDFs are non-conforming, it'd be an improvement to allow them to > be read and processed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira