Hi,
I am new to Solr and tried to follow the guide to upload PDF data using
Tika, on Solr 8.7.0 (running on Debian 10):
https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
but I get an HTTP 404 error when trying to import the file.
In the solr installation directory, after spinning up the example server
using
solr/bin/solr -e schemaless
I firstly used the Post Tool to index a PDF file as described in the
guide, giving the following output (paths truncated using “[…]” for
privacy reasons):
bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
"literal.id=doc1"
java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
-Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
SimplePostTool version 5.0.0
Posting files to [base] url
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file solr-word.pdf (application/pdf) to [base]/extract
SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
url:
http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&r
esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
SimplePostTool: WARNING: Response: <html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>
</body>
</html>
SimplePostTool: WARNING: IOException while reading response:
java.io.FileNotFoundException:
http://localhost:8983/solr/gettingstarted/update/extract
?literal.id=doc1&resource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
Time spent: 0:00:00.038
resulting in no actual changes being visible in the Solr.
Using curl results in the same HTTP response:
curl
'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true'
-F "myfile=@example
/exampledocs/solr-word.pdf"
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
<title>Error 404 Not Found</title>
</head>
<body><h2>HTTP ERROR 404 Not Found</h2>
<table>
<tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
<tr><th>STATUS:</th><td>404</td></tr>
<tr><th>MESSAGE:</th><td>Not Found</td></tr>
<tr><th>SERVLET:</th><td>default</td></tr>
</table>
</body>
</html>
Sorry if this has already been discussed somewhere; I have not been able
to find anything helpful yet.
Thank you!
Leon