When using Java direct calls and the AutoDectect parser I notice that if a 
document is deliberately (malware) or accidentally (some bug say) corrupt or 
badly formatted, then the underlying parsers will oft times log an error, but 
this is not passed on by Tika.

Any examples out there on how I can be informed of parsing errors? Basically I 
would like to know that the document has format problems and as much info as I 
can about what is wrong (though in fact I could live with just counting the 
number of errors if that's all that can be done), but I don't want to stop the 
parse if the underlying parser can recover (good to know if it aborts before 
finishing though).

Jim

Reply via email to