Re: Solrj ContentStreamUpdateRequest Slow

Tod Wed, 18 Aug 2010 05:39:11 -0700

On 8/16/2010 6:12 PM, Chris Hostetter wrote:

: > I think your problem may be that StreamingUpdateSolrServer buffers up
: > commands and sends them in batches in a background thread.  if you want to
: > send individual updates in real time (and time them) you should just use
: > CommonsHttpSolrServer
:: My goal is to batch updates. My content lives somewhere else so I was trying
: to find a way to tell Solr where the document lived so it could go out and
: stream it into the index for me.  That's where I thought
: StreamingUpdateSolrServer would help.
If your content lives on a machine which is not your "client" nor your"server" and you want your client to tell your server to go fetch itdirectly then the "stream.url" param is what you need -- that is unrelatedto wether you use StreamingUpdateSolrServer or not.

Do you happen to have a code fragment laying around that demonstratesusing CommonsHttpSolrServer and "stream.url"? I've tried it inconjunction with ContentStreamUpdateRequest and I keep getting anannoying null pointer exception. In the meantime I will check theexamples...

Thinking about it some more, i suspect the reason you might be seeing adelay when using StreamingUpdateSolrServer is because of this bug...
   https://issues.apache.org/jira/browse/SOLR-1990
...if there are no actual documents in your UpdateRequest (because you areusing the stream.url param) then the StreamingUpdateSolrServer blocksuntil all other requests are done, then delegates to the super class (soit never actaully puts your indexing requests in a buffered queue, it justdelays and then does them immediately)
Not sure of a good way arround this off the top of my head, but i'll noteit in SOLR-1990 as another problematic use case that needs dealt with.

Perhaps I can execute an initial update request using a benign filebefore making the "stream.url" call?


Also, to beat a dead horse, this:
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'

... works fine - I just want to do it a LOT and as efficiently aspossible. If I have to I can wrap it in a perl script and run a cURL orLWP loop but I'd prefer to use SolrJ if I can.


Thanks for all your help.


- Tod

Re: Solrj ContentStreamUpdateRequest Slow

Reply via email to