Hi,

We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes with 
about 450 mil documents (~90 mil per shard). We're loading 1000 or less 
documents in CSV format every few minutes. In Solr3, with 300 mil documents, it 
used to take 30 seconds to load 1000 documents while in Solr4, its taking up to 
3 minutes to load 1000 documents. We're using custom sharding, we include 
_shard_=shardid parameter in update command. Upon looking Solr4 log files we 
found that:

1.       Documents are added in a batch of 10 records. How do we increase this 
batch size from 10 to 1000 documents?

2.      We do hard commit after loading 1000 documents. For every hard commit, 
it refreshes searcher on all nodes. Are all caches also refreshed when hard 
commit happens? We're planning to change to soft commit and do auto hard commit 
every 10-15 minutes.

3.      We're not seeing improved query performance compared to Solr3. Queries 
which took 3-5 seconds in Solr3 (300 mil docs) are taking 20 seconds with 
Solr4. We think this could be due to frequent hard commits and searcher 
refresh. Do you think when we change to soft commit and increase the batch 
size, we will see better query performance.

Thanks!


Reply via email to