I'll send a note over the the PDFBox list and ask what they think.

Thanks,
Daniel

On 7/1/2014 11:51 AM, Nick Burch wrote:

On Fri, 27 Jun 2014, Daniel Gibby wrote:
java.io.IOException: Error: Header doesn't contain versioninfo
at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:335)
   at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:177)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1238)
   at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1203)
   at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:111)
...
Shouldn't this be a TikaException of some type, or at least something other than just an IOException?

One option might be to catch the IOException in the Tika code, then re-throw it as a TikaException. However, I'd probably prefer it if we could get the PDFBox project to make it a more specific exception, which we could then catch and re-throw as a TikaException. I'm not sure we want to be catching all PDFBox IOExceptions, as that might mask a real IOException?

Nick

Reply via email to