On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote: > Thank you for your reply > Other archive message you mentioned is posted by me only > I am new to Solr, When you say process outside Solr program. What exactly I > should do? > > I am having lots of text document which I need to index, what should I apply > to these document before loading it to Solr?
Did you not see Erick's reply, where he provided the following link, and said that the program shown there was a decent guide to writing your own program to handle Tika processing? https://lucidworks.com/2012/02/14/indexing-with-solrj/ The blog post includes code that talks to a database, which would be fairly easy to remove/change. Some knowledge of how to write Java programs is required. Tika is a Java API, so writing the program in Java is a prerequisite. The entire point of this idea is to take the Tika processing out of the Solr server(s). If Tika runs within Solr, it can cause Solr to hang or crash. The authors of Tika try as hard as they can to make sure it works well, but the software is dealing with proprietary data formats that are not publicly documented. Sometimes one of those documents can cause Tika to explode. Crashes in client code won't break your application, and it is likely easier to recover from a crash at that level. Thanks, Shawn