Re: Support Tesseract in Apache Solr

2020-02-11 Thread Edward Ribeiro
I second Jorn: don't deploy Tesseract + Tika on the same server as Solr. Tesseract, specially with OCR enabled, will drain your machine resources that could be used to indexing/searching. In addition to that, any malformed PDF could potentially shutdown the Solr server. Best bet would be to use tik

Re: Support Tesseract in Apache Solr

2020-02-11 Thread Jörn Franke
Honestly i would not run tesseract on the same server as Solr. It takes a lot of resources and may negatively impact Solr. Just write a small program using Tika+Tesseract that runs on a different server / container and posts the results to Solr. About your question: Probably Tika (a dependency

Support Tesseract in Apache Solr

2020-02-11 Thread Karan Jain
Hi All, The Solr version 7.6.0 is running on my local machine. I have installed Tesseract through following steps:- yum install tesseract echo export PATH=$PATH:/usr/share/tesseract >>~/.bash_profile echo export TESSDATA_PREFIX=/usr/share/tesseract >>~/.bash_profile Now the deployed Solr is suppo