stream.file= means read a local file from the server that solr runs
on. It has to be a complete path that works from that server. To load
the file over HTTP you have to use @filename to have curl open it.
This path has to work from the program you run curl on, and relative
paths work.

Also, tika does not save the PDF binary, it only pulls words out of
the PDF and stores those.

There's a tika example in solr/trunk/example/exampleDIH in the current
solr trunk. (I don't remember if it's in the solr 1.4 release.) With
this you can save the pdf binary in one field and save the extracted
text in another field. I'm doing this now with html.

On Tue, Feb 9, 2010 at 2:08 AM, alendo <alessandra.donn...@uniroma2.it> wrote:
>
> Ok I'm going ahead (may be:).
> I tried another curl command to send the file from remote:
>
> http://mysolr:xxxx/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf
>
> and the behaviour has been changed: now I get an error in solr log file:
>
> HTTP Status 500 - files/attach-8514.pdf (No such file or directory)
> java.io.FileNotFoundException: files/attach-8514.pdf (No such file or
> directory) at java.io.FileInputStream.open(Native Method) at
> java.io.FileInputStream.<init>(FileInputStream.java:106) at
> org.apache.solr.common.util.ContentStreamBase$FileStream.getStream(ContentStreamBase.java:108)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
>
> etc etc...
>
> --
> View this message in context: 
> http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to