On 7/21/22 13:12, jorge hernandez wrote:
SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/mynescore/update/extract?resource.name=%3cpath_of_the_files>
The problem here is that the _default configset does NOT create the
/update/extract handler, which you need to extract data from document
types like html, word, PDF, etc.
This feature requires loading additional jars, because the feature (also
called SolrCell) is not included in the webapp. It is in the download
as a module.
Note that the following document is for Solr 9.0 ... earlier versions
will be slightly different.
https://solr.apache.org/guide/solr/latest/indexing-guide/indexing-with-tika.html
One final note ... we STRONGLY recommend not using SolrCell in
production. Tika can be unstable -- some documents can cause it to
consume huge amounts of memory, and even crash. If Tika is running
inside Solr when that happens, then Solr itself will suffer the
effects. Instead, you should run Tika in a separate process with crash
handling, so that Solr remains operational if there is a problem with
extraction.
Thanks,
Shawn