I second Jorn: don't deploy Tesseract + Tika on the same server as Solr.
Tesseract, specially with OCR enabled, will drain your machine resources
that could be used to indexing/searching. In addition to that, any
malformed PDF could potentially shutdown the Solr server. Best bet would be
to use tik
Honestly i would not run tesseract on the same server as Solr. It takes a lot
of resources and may negatively impact Solr. Just write a small program using
Tika+Tesseract that runs on a different server / container and posts the
results to Solr.
About your question: Probably Tika (a dependency
Hi All,
The Solr version 7.6.0 is running on my local machine. I have installed
Tesseract through following steps:-
yum install tesseract echo export PATH=$PATH:/usr/share/tesseract
>>~/.bash_profile
echo export TESSDATA_PREFIX=/usr/share/tesseract >>~/.bash_profile
Now the deployed Solr is suppo