Greetings to the Community!
a simple question: is there a way to white-/black-list certain mime- or file-types for OCR? E.g. I'd like to extract and OCR embedded images from PDFs only (which is configurable for that parser, fortunately). The default behaviour for Office parsers is always to extract and OCR inline images, which seems to be unconfigurable (unfortunately). How to turn it off? I played around with <parser-exclude>, <mime-exclude>, <mime> - but no luck. Any ideas? Thanks a lot!