Hi,
I just started using Solr....I am using SolrJ client, but uploading the file 
directly to Solr. I think we can use Tika in our code first.

Here I send the file directly to Solr which will do the text extraction:

CommonsHttpSolrServer solr = new 
CommonsHttpSolrServer("http://localhost:8983/solr";);
solr.setRequestWriter(new BinaryRequestWriter());

ContentStreamUpdateRequest up = new ContentStreamUpdateRequest 
("/update/extract");
// read a file
File file = new File ("tutorial.pdf");
up.addFile(file);
up.setParam("literal.id", "tutorial.pdf");
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
solr.request(up);

So what we need to do is to add Tika.

I have a question about up.setParam - am I able to create my own fields ?
 rgds,
canal




________________________________
From: Steve Johnson <st...@parisgroup.net>
To: solr-user@lucene.apache.org
Sent: Sun, June 27, 2010 6:50:01 AM
Subject: How to index rich document with XML payload?

Greetings,

I am new to Solr, but have gotten as far as successfully indexing documents 
both by sending XML describing the document and by sending the document itself 
using "update/extract".  What I want to do now is, in effect, do both of these 
on each of my documents.  I want to be able to have Tika do its magic first, 
and then I want to add additional fields to my document entries using XML.

Is there any way to do this?  In general, is there any way to apply multiple 
update requests to a single document entry?

I do understand that I can put literal values on the "update/extract" URL to do 
what I'm asking.  This is what I'll have to do if I can't figure out another 
way, but it seems messy to me...I'd much rather send an XML payload.

TIA for any help.


      

Reply via email to