Hello, I am interested in trying out Tika with a different OCR engine and wondering how Tesseract is integrated. Is it possible to write a plugin to call a different engine? While for images it is much easier, can just detect the file type and use an OCR engine instead, for scanned PDFs, I assume there is some bi-directional communication between Tika and Tesseract to detect inline images. Is that correct?
Thanks, Cristi