Sun Xin created HBASE-25014: ------------------------------- Summary: ScheduledChore is never triggered when initalDelay > 1.5*period Key: HBASE-25014 URL: https://issues.apache.org/jira/browse/HBASE-25014 Project: HBase Issue Type: Bug Affects Versions: 2.2.5, 2.2.4, 2.2.3, 3.0.0-alpha-1 Reporter: Sun Xin Assignee: Sun Xin Fix For: 3.0.0-alpha-1
In our recent tests, ScheduledChore is never triggered when initalDelay > 1.5*period. The cause of the bug is the following: The trigger time for a ScheduleChore must be within an acceptable time window that is 1.5 * period. see [here|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L234] timeOfLastRun and timeOfThisRun are two variables that record two adjacent trigger time. [The first initialization of timeOfThisRun|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ScheduledChore.java#L273] is when the ScheduleChore is created, it's not a real trigger time. If we set initialDelay > 1.5 period , after initialDelay, the first time when chore is triggered has exceeded the allowed window. Then [cancel the chore and schedule it again|https://github.com/apache/hbase/blob/e5ca9adc54f9f580f85d21d38217afa97aa79d68/hbase-common/src/main/java/org/apache/hadoop/hbase/ChoreService.java#L176]. So it's stuck in loop when initialDelay > 1.5 period : 1. init timeOfThisRun at a wrong time. 2. wait initalDelay 3. chore trigger, but exceeded the allowed window. 4. cancel chore and schedule it again 5. go step 1. -- This message was sent by Atlassian Jira (v8.3.4#803005)