Is there a way using tika-config.xml to allow PDF files to be OCR'ed (with extractinlineimages=true) but not perform OCR on either specific formats (JPG, GIF) or to disallow OCR on all image/* mime types?
I tried
<parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
<mime-exclude>image/*</mime-exclude>
</parser>
but no luck.
Thanks.
--
Greg Lepore
Information Technology Specialist
National Archives at College Park
