Re: Re : Using SolrJ with Tika

Grant Ingersoll Thu, 03 Sep 2009 17:02:17 -0700

See https://issues.apache.org/jira/browse/SOLR-1411


On Sep 3, 2009, at 6:47 AM, Angel Ice wrote:

Hi

This is the solution I was testing.
I got some difficulties with AutoDetectParser but I think it's thesolution I will use in the end.
Thanks for the advice anyway :)

Regards,

Laurent




________________________________
De : Abdullah Shaikh <abdullah.sha...@viithiisys.com>
À : solr-user@lucene.apache.org
Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s
Objet : Re: Using SolrJ with Tika

Hi Laurent,
I am not sure if this is what you need, but you can extract thecontent fromthe uploaded document (MS Docs, PDF etc) using TIKA and then send itto SOLR
for indexing.

String CONTENT = extract the content using TIKA (you can use
AutoDetectParser)

and then,

SolrInputDocument doc = new SolrInputDocument();
doc.addField("DOC_CONTENT", CONTENT);

solrServer.add(doc);
soltServer.commit();


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice <lbil...@yahoo.fr> wrote:
Hi everybody.

I hope it's the right place for questions, if not sorry.

I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
I have seen a few examples explaining how to use tika to solvethis. Butmost of these examples are using curl to send documents to Solr oran HTML
POST with an input file.
But i'd like to do it in full java.
Is there a way to use Solrj to index the documents with the
ExtractingRequestHandler of SolR or at least to get the extractedxml back
(with the extract.only option) ?

Many thanks.

Laurent.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Re : Using SolrJ with Tika

Reply via email to