On 3/25/2015 9:08 AM, pavelhladik wrote: > Our data are changing frequently so that's why so many deletedDocs. > Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm > looking for best solution howto optimize this huge core without downtime. I > know optimization working in background, but anyway when the optimization is > running our search system is slow and sometimes I receive errors - this > behavior is like a downtime for us. > > I would like to switch to SolrCloud, the performance is not a issue, so I > don't need the sharding feature at this time. I'm more interested with > replication and distribute requests by some Nginx proxy. Idea is: > > 1) proxy forward requests to node1 and optimize cores on node2 > 2) proxy forward requests to node2 and optimize cores on node1 > > But when I do optimize on node2, the node1 is doing optimization as well, > even if I use the "distrib=false" with curl.
You are correct - with SolrCloud, any optimize command will optimize the entire collection, one shard replica at a time, regardless of any distrib parameter. It does NOT optimize multiple replicas or shards in parallel. I thought we had an issue in Jira asking to make optimize honor a "distrib=false" parameter, but I can't find it. Even if that were fixed, it would not help you, because SolrCloud is only optimizing one shard replica at any given moment. Optimization does NOT directly result in downtime ... but because optimize generates a very large amount of disk I/O, it can be disruptive if your server does not have enough resources. I don't have enough information to say for sure, but I am betting that you don't have enough RAM in your machine to effectively cache your index, so anything that negatively affects performance, like an optimize, is too much for your server to handle at the same time as ongoing queries or indexing. The info on this wiki page can help you determine how much total RAM you might need: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn