On 8/16/2010 6:12 PM, Chris Hostetter wrote:
: > I think your problem may be that StreamingUpdateSolrServer buffers up
: > commands and sends them in batches in a background thread. if you want to
: > send individual updates in real time (and time them) you should just use
: > CommonsHttpSolrServer
:
: My goal is to batch updates. My content lives somewhere else so I was trying
: to find a way to tell Solr where the document lived so it could go out and
: stream it into the index for me. That's where I thought
: StreamingUpdateSolrServer would help.
If your content lives on a machine which is not your "client" nor your
"server" and you want your client to tell your server to go fetch it
directly then the "stream.url" param is what you need -- that is unrelated
to wether you use StreamingUpdateSolrServer or not.
Do you happen to have a code fragment laying around that demonstrates
using CommonsHttpSolrServer and "stream.url"? I've tried it in
conjunction with ContentStreamUpdateRequest and I keep getting an
annoying null pointer exception. In the meantime I will check the
examples...
Thinking about it some more, i suspect the reason you might be seeing a
delay when using StreamingUpdateSolrServer is because of this bug...
https://issues.apache.org/jira/browse/SOLR-1990
...if there are no actual documents in your UpdateRequest (because you are
using the stream.url param) then the StreamingUpdateSolrServer blocks
until all other requests are done, then delegates to the super class (so
it never actaully puts your indexing requests in a buffered queue, it just
delays and then does them immediately)
Not sure of a good way arround this off the top of my head, but i'll note
it in SOLR-1990 as another problematic use case that needs dealt with.
Perhaps I can execute an initial update request using a benign file
before making the "stream.url" call?
Also, to beat a dead horse, this:
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'
... works fine - I just want to do it a LOT and as efficiently as
possible. If I have to I can wrap it in a perl script and run a cURL or
LWP loop but I'd prefer to use SolrJ if I can.
Thanks for all your help.
- Tod