[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15772669#comment-15772669
 ] 

Naganarasimha G R commented on MAPREDUCE-6826:
----------------------------------------------

[~varun_saxena]
{code} 
I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state is 
COMMITTING or SUCCEEDED as job is in the process of finishing.
Also is there any point in attempting to scheduler another task attempt if job 
is already in COMMITTING or SUCCEEDED state.
{code}
Yes you are right, IMO in {{TaskAttemptImpl.KilledAfterSuccessTransition}} we 
only check if its {{TaskType.REDUCE}} and ignore the {{TA_KILL}} event i think 
we can ignore preempt messages if the job is in *COMMITTING or SUCCEEDED* 
state. so that relaunching of containers can be avoided and at the same time it 
would be good if Job ignores {{JOB_TASK_COMPLETED}} event when in {{SUCCEEDED}} 
state

> Job fails with InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6826
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6826
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.7.2
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>
> This happens if a container is preempted by scheduler after job starts 
> committing.
> And this exception in turn leads to application being marked as FAILED in 
> YARN.
> I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state 
> is COMMITTING or SUCCEEDED as job is in the process of finishing.
> Also is there any point in attempting to scheduler another task attempt if 
> job is already in COMMITTING or SUCCEEDED state.
> {noformat}
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5
> 2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e55_1482404625971_23910_01_000010 taskAttempt 
> attempt_1482404625971_23910_m_000004_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
> attempt_1482404625971_23910_m_000004_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : linux-19:26009
> 2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
> jvm_1482404625971_23910_m_60473139527690 asked for a task
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
> jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed.
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for 
> JobFinishedEvent 
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, 
> recording last MRAppMaster retry
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified 
> that shouldUnregistered is: true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: 
> true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
> JobHistoryEventHandler notified that forceJobCompletion is true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
> services
> 2016-12-23 09:10:38,800 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
> JobHistoryEventHandler. Size of the outstanding queue size is 1
> 2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 
> AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_e55_1482404625971_23910_01_000010
> 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:0 
> CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 HostLocal:0 RackLocal:0
> 2016-12-23 09:10:38,993 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1482404625971_23910_m_000004_1: Container preempted by 
> scheduler
> 2016-12-23 09:10:38,994 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_1 TaskAttempt Transitioned from 
> SUCCEEDED to KILLED
> 2016-12-23 09:10:38,995 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_000004 Task Transitioned from SUCCEEDED to 
> SCHEDULED
> 2016-12-23 09:10:38,996 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from NEW to 
> UNASSIGNED
> 2016-12-23 09:10:39,044 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-23 09:10:39,050 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482404625971_23910: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11
> 2016-12-23 09:10:40,053 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
> containers 1
> 2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_e55_1482404625971_23910_01_000011 to 
> attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:40,055 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 
> CompletedMaps:5 CompletedReds:0 ContAlloc:9 ContRel:0 HostLocal:0 RackLocal:0
> 2016-12-23 09:10:40,056 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from 
> UNASSIGNED to ASSIGNED
> 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_LAUNCH for container 
> container_e55_1482404625971_23910_01_000011 taskAttempt 
> attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
> attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:40,056 INFO [ContainerLauncher #6] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : linux-17:26009
> 2016-12-23 09:10:40,068 INFO [ContainerLauncher #6] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Shuffle 
> port returned by ContainerManager for attempt_1482404625971_23910_m_000004_2 
> : 26008
> 2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: 
> [attempt_1482404625971_23910_m_000004_2] using containerId: 
> [container_e55_1482404625971_23910_01_000011 on NM: [linux-17:26009]
> 2016-12-23 09:10:40,074 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from ASSIGNED 
> to RUNNING
> 2016-12-23 09:10:40,075 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_000004 Task Transitioned from SCHEDULED to RUNNING
> 2016-12-23 09:10:40,108 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1482404625971_23910: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit=<memory:0, vCores:0> knownNMs=11
> 2016-12-23 09:10:42,949 INFO [Socket Reader #1 for port 27113] 
> SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
> job_1482404625971_23910 (auth:SIMPLE)
> 2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
> jvm_1482404625971_23910_m_60473139527691 asked for a task
> 2016-12-23 09:10:42,995 INFO [IPC Server handler 6 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
> jvm_1482404625971_23910_m_60473139527691 given task: 
> attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:46,163 INFO [IPC Server handler 29 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state update 
> from attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from RUNNING 
> to COMMIT_PENDING
> 2016-12-23 09:10:46,165 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> attempt_1482404625971_23910_m_000004_2 given a go for committing the task 
> output.
> 2016-12-23 09:10:46,166 INFO [IPC Server handler 9 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request 
> from attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:46,167 INFO [IPC Server handler 9 on 27113] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit for 
> attempt_1482404625971_23910_m_000004_2:true
> 2016-12-23 09:10:46,265 INFO [IPC Server handler 20 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from 
> attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1482404625971_23910_m_000004_2 TaskAttempt Transitioned from 
> COMMIT_PENDING to SUCCEEDED
> 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
> attempt attempt_1482404625971_23910_m_000004_2
> 2016-12-23 09:10:46,267 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_000004 Task Transitioned from RUNNING to SUCCEEDED
> 2016-12-23 09:10:46,267 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>       at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:997)
>       at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:139)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1399)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1395)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:192)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:118)
>       at java.lang.Thread.run(Thread.java:745)
> 2016-12-23 09:10:46,270 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from SUCCEEDED to ERROR
> 2016-12-23 09:10:46,271 INFO [Thread-97] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> 2016-12-23 09:10:46,271 INFO [Thread-97] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified 
> that shouldUnregistered is: true
> 2016-12-23 09:10:46,271 INFO [Thread-97] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: 
> true
> 2016-12-23 09:10:46,271 INFO [Thread-97] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
> JobHistoryEventHandler notified that forceJobCompletion is true
> 2016-12-23 09:10:46,271 INFO [Thread-97] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
> services
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to