Re: ExtractingRequestHandler and XmlUpdateHandler

Erik Hatcher Mon, 15 Dec 2008 03:56:07 -0800

Jacob,

Hmmm... seems the wires are still crossed and confusing.



On Dec 15, 2008, at 6:34 AM, Jacob Singh wrote:

This is indeed what I was talking about... It could even be handled
via some type of transient file storage system.  this might even be
better to avoid the risks associated with uploading a huge file across
a network and might (have no idea) be easier to implement.

If the file is visible from the Solr server, there is no need toactually send the bits through HTTP. Solr's content steamcapabilities allow a file to be retrieved from Solr itself.

So I could send the file, and receive back a token which I would then
throw into one of my fields as a reference.  Then using it to map tika
fields as well. like:

<str name="file_mod_date">${FILETOKEN}.last_modified</str>

<str name="file_body">${FILETOKEN}.content</str>

Huh? I'm don't follow the file token thing. Perhaps you're thinkingyou'll post the file, then later update other fields on that samedocument. An important point here is that Solr currently does nothave document update capabilities. A document can be fully replaced,but cannot have fields added to it, once indexed. It needs to behandled all in one shot to accomplish the blending of file/fieldindexing. Note the ExtractingRequestHandler already has the fieldmapping capability.

But, here's a solution that will work for you right now... let Tikaextract the content and return back to you, then turn around and postit and whatever other fields you like:


  <http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput>

In that example, the contents aren't being indexed, just returned backto the client. And you can leverage the content stream capabilitywith this as well avoiding posting the actual binary file, pointingthe extracting request to a file path visible by Solr.


        Erik

Re: ExtractingRequestHandler and XmlUpdateHandler

Reply via email to