Kewin,

Kerwin wrote:
Our approach is similar to what you have mentioned in the jira issue except
that we have all metadata in the xml and not in the database. I am therefore
using a custom XmlUpdateRequestHandler to parse the XML and then calling
Tika from within the XML Loader to parse the content. Until now this seems
to work.
When and in which Solr version do you expect the jira issue to be
addressed?
That's a good question. Since I'm not a Solr committer, I cannot give any estimate on when it will be released (hopefully in Solr 1.5).

-Sascha

On Mon, Nov 16, 2009 at 5:02 PM, Sascha Szott <sz...@zib.de> wrote:

Hi,

the problem you've described -- an integration of DataImportHandler (to
traverse the XML file and get the document urls) and Solr Cell (to extract
content afterwards) -- is already addressed in issue SOLR-1358 (
https://issues.apache.org/jira/browse/SOLR-1358).

Best,
Sascha


Kerwin wrote:

Hi,

I am new to this forum and would like to know if the function described
below has been developed or exists in Solr. If it does not exist, is it a
good Idea and can I contribute.

We need to index multiple documents with different formats. So we use Solr
with Tika (Solr Cell).

Question:
Can you index both metadata and content for multiple documents iteratively
in Solr?
For example I have an XML with metadata and a links to the documents
content. There are many documents in this XML and I would like to index
them
all without firing multiple URLs.

Example of XML
<add>
<doc>
<field name=id>34122</field>
<field name=author>Michael</field>
<field name=size>3MB</field>
<field name=URL>URL of the document</field>
</doc>
</add>
<doc2>.....</doc2>...</docN>

I need to index all these documents by sending this XML in a single
URL.The
collection of documents to be indexed could be on a file system.

I have altered the Solr code to be able to do this but is there an already
existing feature?



Reply via email to