Hello,

I am interested in trying out Tika with a different OCR engine and
wondering how Tesseract is integrated. Is it possible to write a plugin to
call a different engine? While for images it is much easier, can just
detect the file type and use an OCR engine instead, for scanned PDFs, I
assume there is some bi-directional communication between Tika and
Tesseract to detect inline images. Is that correct?

Thanks,
Cristi

Reply via email to