ht to PDFParser?
I'm wondering if perhaps PDFParser doesn't use the TaggedInputStream,
but AutoDetectParser does?
Thanks,
Daniel Gibby
<mailto:dgi...@edirectpublishing.com>
Parser();
parser.parse(stream, textHandler, metadata, parseContext);
} catch (IOException e) (
tagged.throwIfCauseOf(e);
throw new TikaException("Parse error", e);
}
Thanks,
Daniel Gibby
I'll send a note over the the PDFBox list and ask what they think.
Thanks,
Daniel
On 7/1/2014 11:51 AM, Nick Burch wrote:
On Fri, 27 Jun 2014, Daniel Gibby wrote:
java.io.IOException: Error: Header doesn't contain versioninfo
at
org.apache.pdfbox.pdfparser.PDFParser.p
th file uploads, IOExceptions can easily
happen in other ways.
Shouldn't this be a TikaException of some type, or at least something
other than just an IOException?
--
Thanks,
Daniel Gibby
Is the PDF conversion a part of a separate project like the MS word
document conversion is?
I recently helped find and test a bug fix for a .docx conversion
problem, and PDF conversion has various issues I'd like to help with as
well.
Is the PDF conversion code all within the main Tika proje