We didn't copy/paste Solr3 config to solr4. We started with Solr4 config and 
only updated new searcher queries and few other things.

There is no batching while updating/inserting documents in Solr3, is that 
correct? Committing 1000 documents in Solr3 takes 19 seconds while in Solr4 it 
takes about 3-4 minutes. We noticed in Solr4 logs that, commit only returns 
after new searcher is created across all nodes. This is possibly cause 
waitSearcher=true by default in Solr4. This was not the case with Solr3, commit 
would return without waiting for new searcher creation. 

In order to improve performance with Solr4, we first changed from commit=true 
to commit=false in update URL and added autoHardCommit setting in 
solrconfig.xml. This improved performance from 3-4 minutes to 1-2 minutes but 
that is not good enough. 

Then we changed maxBufferedAddsPerServer value in SolrCmdDistributor class from 
10 to 1000 and deployed this class in 
$JETTY_TEMP_FOLDER/solr-webapp/webapp/WEB-INF/classes folder and restarted 
solr4 nodes. But we still see the batch size of 10 being used. Did we change 
correct variable/class? 

Next thing We will try using softCommit=true in update url and check if it 
gives us desired performance. 

Thanks for looking into this. Appreciate your help. 

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Tuesday, August 13, 2013 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 update and query performance question

1> That's hard-coded at present. There's anecdotal evidence that there
     are throughput improvements with larger batch sizes, but no action
     yet.
2> Yep, all searchers are also re-opened, caches re-warmed, etc.
3> Odd. I'm assuming your Solr3 was master/slave setup? Seeing the
    queries would help diagnose this. Also, did you try to copy/paste
    the configuration from your Solr3 to Solr4? I'd start with the
    Solr4 and copy/paste only the parts needed from your SOlr3 setup.

Best
Erick


On Mon, Aug 12, 2013 at 11:38 AM, Joshi, Shital <shital.jo...@gs.com> wrote:

> Hi,
>
> We have SolrCloud (4.4.0) cluster (5 shards and 2 replicas) on 10 boxes
> with about 450 mil documents (~90 mil per shard). We're loading 1000 or
> less documents in CSV format every few minutes. In Solr3, with 300 mil
> documents, it used to take 30 seconds to load 1000 documents while in
> Solr4, its taking up to 3 minutes to load 1000 documents. We're using
> custom sharding, we include _shard_=shardid parameter in update command.
> Upon looking Solr4 log files we found that:
>
> 1.       Documents are added in a batch of 10 records. How do we increase
> this batch size from 10 to 1000 documents?
>
> 2.      We do hard commit after loading 1000 documents. For every hard
> commit, it refreshes searcher on all nodes. Are all caches also refreshed
> when hard commit happens? We're planning to change to soft commit and do
> auto hard commit every 10-15 minutes.
>
> 3.      We're not seeing improved query performance compared to Solr3.
> Queries which took 3-5 seconds in Solr3 (300 mil docs) are taking 20
> seconds with Solr4. We think this could be due to frequent hard commits and
> searcher refresh. Do you think when we change to soft commit and increase
> the batch size, we will see better query performance.
>
> Thanks!
>
>
>

Reply via email to