Re: Indexing file content with custom field

Sascha Szott Wed, 02 Dec 2009 11:15:33 -0800

Piero,

it sounds you're looking for an integration of Solr Cell and Solr's DIHfacility -- a feature that isn't implemented yet (but the issue isalready addressed in Solr-1358).

As a workaround, you could store the extracted contents in plain textfiles (either by using Solr Cell or Apache Tika directly, which is underthe hood of Solr Cell). Afterwards, you could use DIH'sXPathEntityProcessor (to read the metadata in your XML files) inconjunction with DIH's PlainTextEntityProcessor (to read the previouslycreated text files).

Another workaround would be to pass the metadata content as literalparameters along with the /update/extract request, as described in [1].This would require you to write a small program that constructs andsends appropriate POST requests by parsing your XML metadata files.


Best,
Sascha

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Literals

Rodolico Piero wrote:

Hi,

I need to index the contents of a file (doc, pdf, ecc) and a set of
custom metadata specified in the XML like a standard request to Solr.
From the documentation I can extract the contents of a file with the
request "/update/extract" (tika) and index metadata with a second
request "/update" by passing the XML. How do I do it all in a single
request? (without using curl but using http java lib or solrj). For
example (although I know that is not correct):

<add>
  <doc>
    <field name="id> </ field>
    <field name="myfield-1> </ field>
    <field name="myfield-n> </ field>
    <field name="content"> content of the extracted file (text) </
field>
    </doc>
  </add>

So I search it or by using metadata or full text on the content.
Sorry for my English ...

Thanks a lot.

Piero

Re: Indexing file content with custom field

Reply via email to