[ https://issues.apache.org/jira/browse/AURORA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Khutornenko reassigned AURORA-1506: ----------------------------------------- Assignee: Maxim Khutornenko > Job updater does not handle DELETED tasks properly > -------------------------------------------------- > > Key: AURORA-1506 > URL: https://issues.apache.org/jira/browse/AURORA-1506 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Reporter: Maxim Khutornenko > Assignee: Maxim Khutornenko > > We have observed cases when job updater fails to correctly process tasks that > were deleted from the store (e.g. PENDING or THROTTLED being killed). Sample > log: > {noformat} > I0929 00:01:59.101 THREAD4453 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: > Forwarding task change for role/staging/job/1 > I0929 00:01:59.103 THREAD4453 > org.apache.aurora.scheduler.updater.InstanceUpdater.handleActualAndDesiredPresent: > Task is in terminal state FAILED > I0929 00:01:59.103 THREAD4453 > org.apache.aurora.scheduler.updater.InstanceUpdater.addFailureAndCheckIfFailed: > Observed updated task failure. > I0929 00:01:59.103 THREAD4453 > org.apache.aurora.common.util.StateMachine$Builder$1.execute: Instance 1 > state machine transition WORKING -> FAILED > I0929 00:01:59.103 THREAD4453 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: > IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, > id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: > EvaluationResult{status=FAILED, > sideEffects={1=SideEffect{action=Optional.absent(), statusChanges=[FAILED]}}} > I0929 00:01:59.103 THREAD4453 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.changeJobUpdateStatus: > Update IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, > id=38889720-36e4-4640-99c5-399ebf3a68a0} is now in state ROLLING_BACK > I0929 00:01:59.112 THREAD4453 > org.apache.aurora.common.util.StateMachine$Builder$1.execute: Instance 1 > state machine transition IDLE -> WORKING > I0929 00:01:59.112 THREAD4453 > org.apache.aurora.scheduler.updater.OneWayJobUpdater.startNextInstanceGroup: > Changed working set for update to [1] > I0929 00:01:59.112 THREAD4453 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: > IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, > id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: > EvaluationResult{status=WORKING, > sideEffects={1=SideEffect{action=Optional.of(KILL_TASK), > statusChanges=[WORKING]}}} > I0929 00:01:59.113 THREAD4453 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: > Executing side-effects for update of IJobUpdateKey{job=IJobKey{role=role, > environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0}: > {1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[WORKING]}} > I0929 00:01:59.113 THREAD4453 > org.apache.aurora.scheduler.updater.InstanceActionHandler$KillTask.getReevaluationDelay: > Killing IInstanceKey{jobKey=IJobKey{role=role, environment=staging, > name=job}, instanceId=1} while ROLLING_BACK > I0929 00:01:59.113 THREAD4453 > org.apache.aurora.common.util.StateMachine$Builder$1.execute: > 1443484918618-role-staging-job-1-5d1301fb-5bd8-4d51-b4b9-fd9115d766f0 state > machine transition THROTTLED -> KILLING > I0929 00:01:59.113 THREAD4453 > org.apache.aurora.scheduler.state.TaskStateMachine.addFollowup: Adding work > command DELETE for > 1443484918618-role-staging-job-1-5d1301fb-5bd8-4d51-b4b9-fd9115d766f0 > {noformat} > Later on, when a task change event is received, the updater attempts to run > evaluator again but this time can't find a task that has just being DELETED: > {noformat} > I0929 00:01:59.382 THREAD4454 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute: > Forwarding task change for role/staging/job/1 > I0929 00:01:59.385 THREAD4454 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: > IJobUpdateKey{job=IJobKey{role=role, environment=staging, name=job}, > id=38889720-36e4-4640-99c5-399ebf3a68a0} evaluation result: > EvaluationResult{status=WORKING, > sideEffects={1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[]}}} > I0929 00:01:59.385 THREAD4454 > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater: > Executing side-effects for update of IJobUpdateKey{job=IJobKey{role=role, > environment=staging, name=job}, id=38889720-36e4-4640-99c5-399ebf3a68a0}: > {1=SideEffect{action=Optional.of(KILL_TASK), statusChanges=[]}} > E0929 00:01:59.386 THREAD4454 > org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState: > Failed to handle state change: java.util.NoSuchElementException > java.util.NoSuchElementException > at com.google.common.collect.Iterators$1.next(Iterators.java:80) > at > com.google.common.collect.Iterators.getOnlyElement(Iterators.java:302) > at > com.google.common.collect.Iterables.getOnlyElement(Iterables.java:289) > at > org.apache.aurora.scheduler.updater.InstanceActionHandler$KillTask.getReevaluationDelay(InstanceActionHandler.java:104) > at > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.evaluateUpdater(JobUpdateControllerImpl.java:668) > at > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1400(JobUpdateControllerImpl.java:108) > at > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$6.execute(JobUpdateControllerImpl.java:356) > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) > at > org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) > at > org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:614) > at > org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:611) > at > org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) > at > org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) > at > org.apache.aurora.scheduler.storage.db.DbStorage$2.doWithGateClosed(DbStorage.java:162) > at > org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) > at > org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:84) > at > org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:611) > at > org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:644) > at > org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130) > at > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChanged(JobUpdateControllerImpl.java:347) > at > org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.instanceChangedState(JobUpdateControllerImpl.java:332) > at > org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.taskChangedState(JobUpdateEventSubscriber.java:56) > at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > com.google.common.eventbus.EventSubscriber.handleEvent(EventSubscriber.java:74) > at > com.google.common.eventbus.SynchronizedEventSubscriber.handleEvent(SynchronizedEventSubscriber.java:47) > at com.google.common.eventbus.EventBus.dispatch(EventBus.java:322) > at > com.google.common.eventbus.AsyncEventBus.access$001(AsyncEventBus.java:34) > at > com.google.common.eventbus.AsyncEventBus$1.run(AsyncEventBus.java:117) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)