On 2/5/2014 11:20 PM, Sesha Sendhil Subramanian wrote:
> I am running solr cloud with 10 shards. I do a batch indexing once everyday
> and once indexing is done I call optimize.
> 
> I see that optimize happens on each shard one at a time and not in
> parallel. Is it possible for the optimize to happen in parallel? Each shard
> is on a separate box.

I assume that you are optimizing the collection, and that SolrCloud is
taking care of the optimization of each core automatically.  I've not
looked into how this works, so I could be completely wrong about this.
If this is what you are doing, here's my best guess as to why it works
the way it does:

Optimizing an index is extremely I/O intensive.  The full index must be
read from the original files and re-written.  Unless the index is small
or available RAM is very large, it is also likely that doing an optimize
will temporarily push relevant data out of the OS disk cache.  This has
a strong negative impact on performance.  If you do this on all your
shards at once, the performance impact could be catastrophic, even if
they are all on separate machines.

I would not recommend it, but if you know for sure that your
infrastructure can handle it, then you should be able to optimize them
all at once by sending parallel optimize requests with distrib=false
directly to the Solr cores that hold the shard replicas, not the collection.

Thanks,
Shawn

Reply via email to