The beauty of StreamingUpdateSolrServer is that you don't have to worry about batch sizes; it streams them all. Just keep calling add() with one document and it'll get enqueued. You can pass a collection but there's no performance benefit.
StreamingUpdateSolrServer can be configured to use multiple simultaneous streams into Solr... I wouldn't use as many as you have CPUs; I'd go with 2 then keep adding 1 till your docs/sec levels off. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Jan 12, 2010, at 12:52 PM, Smith G wrote: > Hello , > I am using add() method which receives Collection of > SolrInputDocuments instead of add() which receives a single document. > I am afraid, is sending a group of documents being called as > "batching" in Solr terminology? . If yes, then I am doing it ( by > including additional logic in my code ). But the main point I dont get > is how big a batch could be? How to find most suitable number of > SolrDocs that could be sent at a time. > Also, In case If I go for multi-threaded commons, then the > number of threads to be used is equal to N of "N"-core processor, for > being optimal? . > Thanks. > > 2010/1/12 Yonik Seeley <yo...@lucidimagination.com>: >> On Tue, Jan 12, 2010 at 3:48 AM, Smith G <gudumba.sm...@gmail.com> wrote: >>> Hello All, >>> I am trying to find a better approach ( perfomance wise >>> ) to index documents. Document count is approximately a million+. >>> First, I thought of writing multiple threads using >>> CommonsHttpSolrServer to submit documents. But later I found out >>> StreamingUpdateSolrServer, which says we can forget about batching. >>> >>> 1) We can pass thread-count parameter to StreamingUpdateSolrServer, >>> does it exactly serve the same as writing multiple threads using >>> CommonsHttpSolrServer ?. >> >> Not quite - streaming update solr server batches documents on the fly. >> So if you have a server with N CPUs, you should only need N threads >> to saturate it. Using multiple threads with CommonsHttpSolrServer, >> it's still one document per request (unless you do your own batching) >> and there is still latency between request and response, meaning it >> would take more threads to fill in that latency. >> >> -Yonik >> http://www.lucidimagination.com >>