On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only
> I am new to Solr, When you say process outside Solr program. What exactly I 
> should do?
>
> I am having lots of text document which I need to index, what should I apply 
> to these document before loading it to Solr?

Did you not see Erick's reply, where he provided the following link, and
said that the program shown there was a decent guide to writing your own
program to handle Tika processing?

https://lucidworks.com/2012/02/14/indexing-with-solrj/

The blog post includes code that talks to a database, which would be
fairly easy to remove/change.  Some knowledge of how to write Java
programs is required.  Tika is a Java API, so writing the program in
Java is a prerequisite.

The entire point of this idea is to take the Tika processing out of the
Solr server(s).  If Tika runs within Solr, it can cause Solr to hang or
crash.  The authors of Tika try as hard as they can to make sure it
works well, but the software is dealing with proprietary data formats
that are not publicly documented.  Sometimes one of those documents can
cause Tika to explode.  Crashes in client code won't break your
application, and it is likely easier to recover from a crash at that level.

Thanks,
Shawn

Reply via email to