Is there a way using tika-config.xml to allow PDF files to be OCR'ed (with
extractinlineimages=true) but not perform OCR on either specific formats
(JPG, GIF) or to disallow OCR on all image/* mime types?

I tried
<parser class="org.apache.tika.parser.ocr.TesseractOCRParser">
      <mime-exclude>image/*</mime-exclude>
</parser>

but no luck.

Thanks.

-- 
Greg Lepore
Information Technology Specialist
National Archives at College Park

Reply via email to