On 2/5/2014 11:20 PM, Sesha Sendhil Subramanian wrote: > I am running solr cloud with 10 shards. I do a batch indexing once everyday > and once indexing is done I call optimize. > > I see that optimize happens on each shard one at a time and not in > parallel. Is it possible for the optimize to happen in parallel? Each shard > is on a separate box.
I assume that you are optimizing the collection, and that SolrCloud is taking care of the optimization of each core automatically. I've not looked into how this works, so I could be completely wrong about this. If this is what you are doing, here's my best guess as to why it works the way it does: Optimizing an index is extremely I/O intensive. The full index must be read from the original files and re-written. Unless the index is small or available RAM is very large, it is also likely that doing an optimize will temporarily push relevant data out of the OS disk cache. This has a strong negative impact on performance. If you do this on all your shards at once, the performance impact could be catastrophic, even if they are all on separate machines. I would not recommend it, but if you know for sure that your infrastructure can handle it, then you should be able to optimize them all at once by sending parallel optimize requests with distrib=false directly to the Solr cores that hold the shard replicas, not the collection. Thanks, Shawn