Hello, I am also interested in the same. We found Tesseract to be quite slow, so want to try out some alternatives. It has some limitations as well.
Regards, Sandeep Kulkarni From: Cristian Zamfir <cri...@cyberhaven.com> Sent: Thursday, August 3, 2023 3:43 PM To: user@tika.apache.org Subject: [External] Using Tika with another OCR engine Hello, I am interested in trying out Tika with a different OCR engine and wondering how Tesseract is integrated. Is it possible to write a plugin to call a different engine? While for images it is much easier, can just detect the file type and use an OCR engine instead, for scanned PDFs, I assume there is some bi-directional communication between Tika and Tesseract to detect inline images. Is that correct? Thanks, Cristi