The beauty of StreamingUpdateSolrServer is that you don't have to worry about 
batch sizes; it streams them all.  Just keep calling add() with one document 
and it'll get enqueued.  You can pass a collection but there's no performance 
benefit.

StreamingUpdateSolrServer can be configured to use multiple simultaneous 
streams into Solr... I wouldn't use as many as you have CPUs; I'd go with 2 
then keep adding 1 till your docs/sec levels off.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

On Jan 12, 2010, at 12:52 PM, Smith G wrote:

> Hello ,
>             I am using add() method which receives Collection of
> SolrInputDocuments instead of add() which receives a single document.
> I am afraid, is sending a group of documents being called as
> "batching" in Solr terminology? . If yes, then I am doing it ( by
> including additional logic in my code ). But the main point I dont get
> is how big a batch could be? How to find most suitable number of
> SolrDocs that could be sent at a time.
>         Also, In case If I go for multi-threaded commons, then the
> number of threads to be used is equal to N of "N"-core processor, for
> being  optimal? .
> Thanks.
> 
> 2010/1/12 Yonik Seeley <yo...@lucidimagination.com>:
>> On Tue, Jan 12, 2010 at 3:48 AM, Smith G <gudumba.sm...@gmail.com> wrote:
>>> Hello All,
>>>               I am trying to find a better approach ( perfomance wise
>>> ) to index documents. Document count is approximately a million+.
>>> First, I thought of writing multiple threads using
>>> CommonsHttpSolrServer to submit documents. But later I found out
>>> StreamingUpdateSolrServer, which says we can forget about batching.
>>> 
>>> 1) We can pass thread-count parameter to StreamingUpdateSolrServer,
>>> does it exactly serve the same as writing multiple threads using
>>> CommonsHttpSolrServer ?.
>> 
>> Not quite - streaming update solr server batches documents on the fly.
>>  So if you have a server with N CPUs, you should only need N threads
>> to saturate it.  Using multiple threads with CommonsHttpSolrServer,
>> it's still one document per request (unless you do your own batching)
>> and there is still latency between request and response, meaning it
>> would take more threads to fill in that latency.
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 


Reply via email to