Hi Erik, Sorry I wasn't totally clear. Some responses inline: > If the file is visible from the Solr server, there is no need to actually > send the bits through HTTP. Solr's content steam capabilities allow a file > to be retrieved from Solr itself. >
Yeah, I know. But in my case not possible. Perhaps a simple file receiving HTTP POST handler which simply stored the file on disk and returned a path to it is the way to go here. >> So I could send the file, and receive back a token which I would then >> throw into one of my fields as a reference. Then using it to map tika >> fields as well. like: >> >> <str name="file_mod_date">${FILETOKEN}.last_modified</str> >> >> <str name="file_body">${FILETOKEN}.content</str> > > Huh? I'm don't follow the file token thing. Perhaps you're thinking > you'll post the file, then later update other fields on that same document. > An important point here is that Solr currently does not have document > update capabilities. A document can be fully replaced, but cannot have > fields added to it, once indexed. It needs to be handled all in one shot to > accomplish the blending of file/field indexing. Note the > ExtractingRequestHandler already has the field mapping capability. > Sorta... I was more thinking of a new feature wherein a Solr Request handler doesn't actually put the file in the index, merely runs it through tika and stores a datastore which links a "token" with the tika extraction. Then the client could make another request w/ the XMLUpdateHandler which referenced parts of the stored tika extraction. > But, here's a solution that will work for you right now... let Tika extract > the content and return back to you, then turn around and post it and > whatever other fields you like: > > <http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput> > > In that example, the contents aren't being indexed, just returned back to > the client. And you can leverage the content stream capability with this as > well avoiding posting the actual binary file, pointing the extracting > request to a file path visible by Solr. > Yeah, I saw that. This is pretty much what I was talking about above, the only disadvantage (which is a deal breaker in our case) is the extra bandwidth to move the file back and forth. Thanks for your help and quick response. I think we'll integrate the POST fields as Grant has kindly provided multi-value input now, and see what happens in the future. I realize what I'm talking about (XML and binary together) is probably not a high priority feature. Best Jacob > Erik > > -- +1 510 277-0891 (o) +91 9999 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com