Hello,

I am also interested in the same. We found Tesseract to be quite slow, so want 
to try out some alternatives. It has some limitations as well.

Regards,
Sandeep Kulkarni

From: Cristian Zamfir <cri...@cyberhaven.com>
Sent: Thursday, August 3, 2023 3:43 PM
To: user@tika.apache.org
Subject: [External] Using Tika with another OCR engine

Hello,

I am interested in trying out Tika with a different OCR engine and wondering 
how Tesseract is integrated. Is it possible to write a plugin to call a 
different engine? While for images it is much easier, can just detect the file 
type and use an OCR engine instead, for scanned PDFs, I assume there is some 
bi-directional communication between Tika and Tesseract to detect inline 
images. Is that correct?

Thanks,
Cristi


Reply via email to