When using Java direct calls and the AutoDectect parser I notice that if a document is deliberately (malware) or accidentally (some bug say) corrupt or badly formatted, then the underlying parsers will oft times log an error, but this is not passed on by Tika.
Any examples out there on how I can be informed of parsing errors? Basically I would like to know that the document has format problems and as much info as I can about what is wrong (though in fact I could live with just counting the number of errors if that's all that can be done), but I don't want to stop the parse if the underlying parser can recover (good to know if it aborts before finishing though). Jim
