Hi,

the problem you've described -- an integration of DataImportHandler (to traverse the XML file and get the document urls) and Solr Cell (to extract content afterwards) -- is already addressed in issue SOLR-1358 (https://issues.apache.org/jira/browse/SOLR-1358).

Best,
Sascha

Kerwin wrote:
Hi,

I am new to this forum and would like to know if the function described
below has been developed or exists in Solr. If it does not exist, is it a
good Idea and can I contribute.

We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).

Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index them
all without firing multiple URLs.

Example of XML
<add>
<doc>
<field name=id>34122</field>
<field name=author>Michael</field>
<field name=size>3MB</field>
<field name=URL>URL of the document</field>
</doc>
</add>
<doc2>.....</doc2>...</docN>

I need to index all these documents by sending this XML in a single URL.The
collection of documents to be indexed could be on a file system.

I have altered the Solr code to be able to do this but is there an already
existing feature?


Reply via email to