On 7/21/22 13:12, jorge hernandez wrote:
SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cpath_of_the_files>

The problem here is that the _default configset does NOT create the /update/extract handler, which you need to extract data from document types like html, word, PDF, etc.

This feature requires loading additional jars, because the feature (also called SolrCell) is not included in the webapp.  It is in the download as a module.

Note that the following document is for Solr 9.0 ... earlier versions will be slightly different.

https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html

One final note ... we STRONGLY recommend not using SolrCell in production.  Tika can be unstable -- some documents can cause it to consume huge amounts of memory, and even crash.  If Tika is running inside Solr when that happens, then Solr itself will suffer the effects.  Instead, you should run Tika in a separate process with crash handling, so that Solr remains operational if there is a problem with extraction.

Thanks,
Shawn

Reply via email to