Ok, thanks for the answer Yonik. After looking closer at the index splitting code, definitely seems like you wouldn't want to pay the network I/O cost when creating the sub-shard indexes. Might be cool to be able to specify a different local disk path for the new cores so that we can get some extra disks working in parallel during the split (icing on the cake of course).
Cheers, Tim On Wed, Jul 17, 2013 at 10:40 AM, Yonik Seeley <yo...@lucidworks.com> wrote: > On Wed, Jul 17, 2013 at 12:26 PM, Timothy Potter <thelabd...@gmail.com> wrote: >> This is not a problem per se, just want to verify that we're not able >> to specify which server shard splits are created as of 4.3.1? From >> what I've seen, the new cores for the sub-shards are created on the >> leader of the shard being split. >> >> Of course it's easy enough to migrate the new sub-shards to another >> node after the fact especially since replication occurs automatically >> for the splits. >> >> Seems like if the shard being split is large enough that doing the >> split on the same node could cause some resource issues so might be >> better to do the split on another server. Or is my assumption that the >> split operation is pretty expensive incorrect? > > I think it will be mostly IO - it may or may not be expensive > depending on how IO bound your box already is. > > Splitting directly to a different servers would be cool, but would > seem to require some sort of Directory implementation that streams > things over the network rather than just locally store on disk. It's > something I think we want in the future, but was a bit too much to > bite off for the first iteration of this feature. > >> Lastly, also seems like we don't have control over where the replicas >> of the split shards go? > > Seems like a good idea to optionally allow this... > > -Yonik > http://lucidworks.com