Robert Joseph Evans created STORM-3024: ------------------------------------------
Summary: Allow scheduling for RAS to happen in the background Key: STORM-3024 URL: https://issues.apache.org/jira/browse/STORM-3024 Project: Apache Storm Issue Type: New Feature Components: storm-server Affects Versions: 2.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans We have run into some issues recently where occasionally a strategy on a very large cluster will take an extra long amount of time finish scheduling. This slowness cascades into other issues, like topologies not being able to be killed because the timer thread is still in use trying to run scheduling. The plan is to make scheduling happen in a thread pool. The main thread will wait for up to a configurable amount of time for the topology to be scheduled, but if it does not complete in that time it will be left to keep running in the background thread in hopes that later on it will be scheduled. If for some reason the state of the cluster changes while scheduling is happening in the background we will cancel the scheduling, as any scheduling it produced may not be able to fit on the cluster. The next time the scheduler runs it will restart the scheduling and hopefully allow the cluster to reach a steady state even if it takes a while, but without blocking kills and other critical operations from happening. Note that we are also working on optimizing scheduling as well so that these issues don't happen in the first place. -- This message was sent by Atlassian JIRA (v7.6.3#76005)