> Thanks Shawn, clean way to do it, indeed. And going your route, one > could even copy the existing shards into the new collection and then > delete the data which is getting reindexed on the new nodes. That would > spare reindexing everything. > > But in my case, I add boxes after a noticeable performance degradation > due to data volume increase. So the old boxes cannot afford reindexing > data (or deleting if using the propose variation) in the new collection > while serving searches with the old collection. Unless there is a way to > bound aggressively the RAM consumption of new collection (disabling > MMAP?), given that it's not being used for search during the transition? > That said, even if that was possible, both collections would compete for > disk IOs.
I don't think you'd want to disable mmap. It could be done, by choosing another DirectoryFactory object. Adding memory is likely to be the only sane way forward. Another possibility would be to bump up the maxShardsPerNode value and build the new collection (with the proper number of shards) only on the new machines... Then when they are built, move them to their proper homes and manually adjust the cluster state in zookeeper. This will still generate a lot of I/O, but hopefully it will last for less time on the wall clock, and it will be something you can do when load is low. After that done and you've switched to it, you can add replicas with either the addreplica collections api or with the core admin api. You should be on the newest Solr version... Lots of bugs have been found and fixed. One thing I wonder is whether the MIGRATE api can be used on an entire collection. It says it works by shard key, but I suspect that most users will not be using that functionality. Thanks, Shawn