Re: Task attempt failed after TaskAttemptListenerImpl ping

Harsh J Thu, 23 May 2013 22:25:40 -0700

Assuming you mean "failed" there instead of "filed".

In MR, a ping message is sent over the TaskUmbilicalProtocol from the
Task container to the MR AM. A ping is only sent as an alternative, to
check self, if there's no progress to report from the task. No
progress to report for a long time generally means the task has
stopped doing work/isn't updating its status/is stuck.


On Fri, May 24, 2013 at 8:46 AM, YouPeng Yang <yypvsxf19870...@gmail.com> wrote:
> Hi hadoop users
>
>  I find that One application filed when the  container log it shows that it
> always ping [2].
>
> How does it come out?
>
> I'm using the YARN and MRv2(CDH-4.1.2)
>
>
>
> [1]resourcemanager.log
> 2013-05-24 09:45:07,192 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done
> launching container Container: [ContainerId:
> container_1369298403742_0144_01_000001, NodeId: wxossetl3:29984,
> NodeHttpAddress: wxossetl3:8042, Resource: memory: 1536, Priority:
> org.apache.hadoop.yarn.api.records.impl.pb.PriorityPBImpl@1f, State: NEW,
> Token: null, Status: container_id {, app_attempt_id {, application_id {, id:
> 144, cluster_timestamp: 1369298403742, }, attemptId: 1, }, id: 1, }, state:
> C_NEW, ] for AM appattempt_1369298403742_0144_000001
> 2013-05-24 09:45:07,192 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1369298403742_0144_000001 State change from ALLOCATED to LAUNCHED
> 2013-05-24 09:45:08,186 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
> container_1369298403742_0144_01_000001 Container Transitioned from ACQUIRED
> to RUNNING
> 2013-05-24 09:45:10,533 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: AM
> registration appattempt_1369298403742_0144_000001
> 2013-05-24 09:45:10,533 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop
> IP=172.16.250.1OPERATION=Register App Master TARGET=ApplicationMasterService
> RESULT=SUCCESS APPID=application_1369298403742_0144
> APPATTEMPTID=appattempt_1369298403742_0144_000001
> 2013-05-24 09:45:10,533 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1369298403742_0144_000001 State change from LAUNCHED to RUNNING
> 2013-05-24 09:45:10,533 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1369298403742_0144 State change from ACCEPTED to RUNNING
>
> [2] container syslog:
>
>
> 2013-05-24 10:00:10,222 INFO [uber-SubtaskRunner]
> org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event
> EventType: CONTAINER_REMOTE_LAUNCH for container
> container_1369298403742_0153_01_000001 taskAttempt
> attempt_1369298403742_0153_m_000000_0
> 2013-05-24 10:00:10,223 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt:
> [attempt_1369298403742_0153_m_000000_0] using containerId:
> [container_1369298403742_0153_01_000001 on NM: [wxossetl1:46256]
> 2013-05-24 10:00:10,223 INFO [uber-SubtaskRunner]
> org.apache.hadoop.mapred.LocalContainerLauncher: mapreduce.cluster.local.dir
> for uber task:
> /tmp/nm-local-dir/usercache/hadoop/appcache/application_1369298403742_0153
> 2013-05-24 10:00:10,225 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
> attempt_1369298403742_0153_m_000000_0 TaskAttempt Transitioned from ASSIGNED
> to RUNNING
> 2013-05-24 10:00:10,226 INFO [AsyncDispatcher event handler]
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
> task_1369298403742_0153_m_000000 Task Transitioned from SCHEDULED to RUNNING
> 2013-05-24 10:00:10,237 INFO [uber-SubtaskRunner]
> org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@34e77781
> 2013-05-24 10:00:13,224 INFO [communication thread]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
> attempt_1369298403742_0153_m_000000_0
> 2013-05-24 10:00:16,225 INFO [communication thread]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
> attempt_1369298403742_0153_m_000000_0
> 2013-05-24 10:00:19,225 INFO [communication thread]
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Ping from
> attempt_1369298403742_0153_m_000000_0
>
> ......
>



-- 
Harsh J

Re: Task attempt failed after TaskAttemptListenerImpl ping

Reply via email to