[ https://issues.apache.org/jira/browse/SOLR-12729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrzej Bialecki resolved SOLR-12729. -------------------------------------- Resolution: Fixed > SplitShardCmd should lock the parent shard to prevent parallel splitting > requests > --------------------------------------------------------------------------------- > > Key: SOLR-12729 > URL: https://issues.apache.org/jira/browse/SOLR-12729 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > Fix For: 7.6, master (8.0) > > > This scenario was discovered by the simulation framework, but it exists also > in the non-simulated code. > When {{IndexSizeTrigger}} requests SPLITSHARD, which is then successfully > started and “completed” from the point of view of {{ExecutePlanAction}}, the > reality is that it still can take significant amount of time until the moment > when the new replicas fully recover and cause the switch of shard states > (parent to INACTIVE, child from RECOVERY to ACTIVE). > If this time is longer than the trigger's {{waitFor}} the trigger will issue > the same SPLITSHARD request again. {{SplitShardCmd}} doesn't prevent this new > request from being processed because the parent shard is still ACTIVE. > However, a section of the code in {{SplitShardCmd}} will realize that > sub-slices with the target names already exist and they are not active, at > which point it will delete the new sub-slices ({{SplitShardCmd:182}}). > The end result is an infinite loop, where {{IndexSizeTrigger}} will keep > generating SPLITSHARD, and {{SplitShardCmd}} will keep deleting the > recovering sub-slices created by the previous command. > A simple solution is for the parent shard to be marked to indicate that it’s > in a process of splitting, so that no other split is attempted on the same > shard. Furthermore, {{IndexSizeTrigger}} could temporarily exclude such > shards from monitoring. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org