On 8/16/2010 6:12 PM, Chris Hostetter wrote:
: > I think your problem may be that StreamingUpdateSolrServer buffers up
: > commands and sends them in batches in a background thread.  if you want to
: > send individual updates in real time (and time them) you should just use
: > CommonsHttpSolrServer
: : My goal is to batch updates. My content lives somewhere else so I was trying
: to find a way to tell Solr where the document lived so it could go out and
: stream it into the index for me.  That's where I thought
: StreamingUpdateSolrServer would help.

If your content lives on a machine which is not your "client" nor your "server" and you want your client to tell your server to go fetch it directly then the "stream.url" param is what you need -- that is unrelated to wether you use StreamingUpdateSolrServer or not.


Do you happen to have a code fragment laying around that demonstrates using CommonsHttpSolrServer and "stream.url"? I've tried it in conjunction with ContentStreamUpdateRequest and I keep getting an annoying null pointer exception. In the meantime I will check the examples...



Thinking about it some more, i suspect the reason you might be seeing a delay when using StreamingUpdateSolrServer is because of this bug...

   https://issues.apache.org/jira/browse/SOLR-1990

...if there are no actual documents in your UpdateRequest (because you are using the stream.url param) then the StreamingUpdateSolrServer blocks until all other requests are done, then delegates to the super class (so it never actaully puts your indexing requests in a buffered queue, it just delays and then does them immediately)

Not sure of a good way arround this off the top of my head, but i'll note it in SOLR-1990 as another problematic use case that needs dealt with.

Perhaps I can execute an initial update request using a benign file before making the "stream.url" call?

Also, to beat a dead horse, this:
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'

... works fine - I just want to do it a LOT and as efficiently as possible. If I have to I can wrap it in a perl script and run a cURL or LWP loop but I'd prefer to use SolrJ if I can.

Thanks for all your help.


- Tod

Reply via email to