OCR black/white listing

jetnet Sun, 17 Apr 2016 13:14:01 -0700

Greetings to the Community!


a simple question: is there a way to white-/black-list certain mime- or
file-types for OCR?
E.g. I'd like to extract and OCR embedded images from PDFs only (which is
configurable for that parser, fortunately). The default behaviour for
Office parsers is always to extract and OCR inline images, which seems to
be unconfigurable (unfortunately). How to turn it off?
I played around with <parser-exclude>, <mime-exclude>, <mime> - but no luck.
Any ideas? Thanks a lot!

OCR black/white listing

Reply via email to