> Thanks Shawn, clean way to do it, indeed. And going your route, one
> could even copy the existing shards into the new collection and then
> delete the data which is getting reindexed on the new nodes. That would
> spare reindexing everything.
>
> But in my case, I add boxes after a noticeable performance degradation
> due to data volume increase. So the old boxes cannot afford reindexing
> data (or deleting if using the propose variation) in the new collection
> while serving searches with the old collection. Unless there is a way to
> bound aggressively the RAM consumption of new collection (disabling
> MMAP?), given that it's not being used for search during the transition?
> That said, even if that was possible, both collections would compete for
> disk IOs.

I don't think you'd want to disable mmap. It could be done, by choosing
another DirectoryFactory object. Adding memory is likely to be the only
sane way forward.

Another possibility would be to bump up the maxShardsPerNode value and
build the new collection (with the proper number of shards) only on the
new machines... Then when they are built, move them to their proper homes
and manually adjust the cluster state in zookeeper. This will still
generate a lot of I/O, but hopefully it will last for less time on the
wall clock, and it will be something you can do when load is low.

After that done and you've switched to it, you can add replicas with
either the addreplica collections api or with the core admin api. You
should be on the newest Solr version... Lots of bugs have been found and
fixed.

One thing I wonder is whether the MIGRATE api can be used on an entire
collection. It says it works by shard key, but I suspect that most users
will not be using that functionality.

Thanks,
Shawn



Reply via email to