Re: Getting indexed content of files using ExtractingRequestHandler

Erick Erickson Sun, 14 Jul 2013 05:19:57 -0700

Well, cURL is generally not what people use for production. What I'd consider
is using SolrJ (which you can access Tika from) and then store the raw pdf
(or whatever) document as a binary data type in Solr.


Here's an example (with DB indexing mixed in, but you should be able
to pull that part out).

Best
Erick

On Sun, Jul 14, 2013 at 4:05 AM, xan <p...@prateeksachan.com> wrote:
> Hi,
>
> I'm using the PHP Solr client (ver: 1.0.2).
>
> I'm indexing the contents through my database.
> Suppose $data is a stdClass object having id, name, title, etc. from a
> database entry.
>
> Next, I declare a solr Document and assign fields to it.:
>
> $doc = new SolrInputDocument();
> $doc->addField ('id' , $data->id);
> $doc->addField ('name' , $data->name);
> ....
> ....
>
> I wanted to know how can I store the contents of a pdf file (whose path I've
> stored in $data->filepath), in the same solr document, say in a field
> ('filecontent').
>
> Referring to the wiki, I was unable to figure out the proper cURL request
> for achieving this. I was able to create a completely new solr document but
> how do I get the contents of the pdf file in the same solr document so that
> I can store that in a field?
>
>
> $doc = new SolrInputDocument();
> $doc->addField ('id' , $data->id);
> $doc->addField ('name' , $data->name);
> ....
> ....
> //fire the curl request here referring to the file at $data->filepath
> $doc->addField ('filecontent' , //content of the pdf file);
>
> Also, instead of firing the raw cURL request, is there a better way? I don't
> know if the current PECL SOLR Client 1.0.2 has the feature of indexing pdf
> files.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Getting-indexed-content-of-files-using-ExtractingRequestHandler-tp4077856.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting indexed content of files using ExtractingRequestHandler

Reply via email to