Re: Tez lauche container error when use UseG1GC

Jianfeng (Jeff) Zhang Sun, 31 May 2015 17:51:22 -0700

>From the logs, it is due to container fail to launch. I guess it is due to 
>some yarn configuration issue. You’d better to check the node manager logs.
And it looks like you haven’t enable the log aggregation, so you can’t get the 
node manage logs by command “yarn logs”.
You need to check each node manager machine, by default the logs are located in 
$HADOOP_HOME/logs. In the node manager logs, you should be able to see why the 
container fail to launch.

BTW, I would suggest you to enable the log aggregation. You can check this for 
details  
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_yarn_resource_mgt/content/ref-375ff479-e530-46d8-9f96-8b52dadb5183.1.html

Best Regard,
Jeff Zhang

From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
<r7raul1...@163.com<mailto:r7raul1...@163.com>>
Reply-To: user <user@tez.apache.org<mailto:user@tez.apache.org>>
Date: Monday, June 1, 2015 at 8:07 AM
To: user <user@tez.apache.org<mailto:user@tez.apache.org>>
Subject: Re: Re: Tez lauche container error when use UseG1GC

Log is:
Status: Running (Executing on YARN cluster with App id 
application_1432885077153_0011)

--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 1 0 0 1 4 0
Reducer 2 KILLED 1 0 0 1 0 1
Reducer 3 KILLED 1 0 0 1 0 1
--------------------------------------------------------------------------------
VERTICES: 00/03 [>>--------------------------] 0% ELAPSED TIME: 16.13 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1432885077153_0011_1_00, 
diagnostics=[Task failed, taskId=task_1432885077153_0011_1_00_000000, 
diagnostics=[TaskAttempt 0 failed, info=[Container 
container_1432885077153_0011_01_000002 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
]], TaskAttempt 1 failed, info=[Container 
container_1432885077153_0011_01_000003 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
]], TaskAttempt 2 failed, info=[Container 
container_1432885077153_0011_01_000004 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_000004
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
]], TaskAttempt 3 failed, info=[Container 
container_1432885077153_0011_01_000005 finished with diagnostics set to 
[Container failed. Exception from container-launch.
Container id: container_1432885077153_0011_01_000005
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1
]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex 
vertex_1432885077153_0011_1_00 [Map 1] killed/failed due to:null]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1432885077153_0011_1_01, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_01 
[Reducer 2] killed/failed due to:null]
Vertex killed, vertexName=Reducer 3, vertexId=vertex_1432885077153_0011_1_02, 
diagnostics=[Vertex received Kill while in RUNNING state., Vertex killed as 
other vertex failed. failedTasks:0, Vertex vertex_1432885077153_0011_1_02 
[Reducer 3] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:2
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask

yarn logs -applicationId application_1432885077153_0011
15/06/01 08:07:01 INFO client.RMProxy: Connecting to ResourceManager at 
localhost/127.0.0.1:8032
Logs not available at /tmp/logs/root/logs/application_1432885077153_0011
Log aggregation has not completed or is not enabled.

________________________________
r7raul1...@163.com<mailto:r7raul1...@163.com>

From: Hitesh Shah<mailto:hit...@apache.org>
Date: 2015-05-29 23:31
To: user<mailto:user@tez.apache.org>
Subject: Re: Tez lauche container error when use UseG1GC
To clarify, given that the error is showing up with 
container_1432885077153_0004_01_000005, that means that the AM launched 
properly.

Use “bin/yarn logs -applicationId application_1432885077153_0004" to get the 
logs. See if there are any errors for the logs for 
container_1432885077153_0004_01_000005. If there are none, you will need to 
search for "Assigning container to task” for the above container in the AM’s 
logs. Using this log line, you will see what host the container belongs to and 
you should then look at the NodeManager logs and search for the container id.

The above would be a lot simpler if you have the UI setup to work against 0.5.3 
but may still require you to dig through the NodeManager logs.

thanks
— Hitesh

On May 29, 2015, at 3:48 AM, Jianfeng (Jeff) Zhang 
<jzh...@hortonworks.com<mailto:jzh...@hortonworks.com>> wrote:

>
> Could you check the yarn app logs to see what the error is ?  If there’s 
> still no useful info, you may refer the yarn RM/NN logs
>
>
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: "r7raul1...@163.com<mailto:r7raul1...@163.com>" 
> <r7raul1...@163.com<mailto:r7raul1...@163.com>>
> Reply-To: user <user@tez.apache.org<mailto:user@tez.apache.org>>
> Date: Friday, May 29, 2015 at 4:16 PM
> To: user <user@tez.apache.org<mailto:user@tez.apache.org>>
> Subject: Re: Tez lauche container error when use UseG1GC
>
> BTW my tez_site.xml content is:
> <configuration>
> <property>
> <name>tez.lib.uris</name>
> <value>hdfs:///apps/tez-0.5.3/tez-0.5.3.tar.gz</value>
> </property>
> <property>
> <name>tez.task.generate.counters.per.io</name>
> <value>true</value>
> </property>
> <property>
> <description>Log history using the Timeline Server</description>
> <name>tez.history.logging.service.class</name>
> <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
> </property>
> <property>
> <description>Publish configuration information to Timeline server 
> </description>
> <name>tez.runtime.convert.user-payload.to.history-text</name>
> <value>true</value>
> </property>
> <property>
> <name>tez.am.launch.cmd-opts</name>
> <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA 
> -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/</value>
> </property>
>
> </configuration>
>
> r7raul1...@163.com<mailto:r7raul1...@163.com>
>
> From: r7raul1...@163.com<mailto:r7raul1...@163.com>
> Date: 2015-05-29 16:15
> To: user
> Subject: Tez lauche container error when use UseG1GC
>  I change my mapreduce.map.java.opts  's  value from 
> -Djava.net.preferIPv4Stack=true  -Xmx825955249  to  
> -Djava.net.preferIPv4Stack=true -XX:+UseG1GC  -Xmx825955249
>
> When I run query by hive 1.1.0+tez0.53 in hadoop 2.5.0.
>
> set mapreduce.framework.name=yarn-tez;
> set hive.execution.engine=tez;
> select userid,count(*) from u_data group by userid order by userid;
> The  query return error.
> I found error :
> 2015-05-29 16:02:39,064 WARN [AsyncDispatcher event handler] 
> container.AMContainerImpl: Container container_1432885077153_0004_01_000005 
> finished with diagnostics set to [Container failed. Exception from 
> container-launch.
> Container id: container_1432885077153_0004_01_000005
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
> at org.apache.hadoop.util.Shell.run(Shell.java:455)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> But I try
> hive> set hive.execution.engine=mr;
> hive> set mapreduce.framework.name=yarn;
> hive> select userid,count(*) from u_data group by userid order by userid 
> limit 1;
> Query ID = hdfs_20150529160606_d550bca4-0341-4eb0-aace-a9018bfbb7a9
> Total jobs = 2
> Launching Job 1 out of 2
> Number of reduce tasks not specified. Estimated from input data size: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapreduce.job.reduces=<number>
> Starting Job = job_1432885077153_0005, Tracking URL = 
> http://localhost:8088/proxy/application_1432885077153_0005/
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
> -kill job_1432885077153_0005
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2015-05-29 16:06:34,863 Stage-1 map = 0%, reduce = 0%
> 2015-05-29 16:06:40,066 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.72 
> sec
> 2015-05-29 16:06:48,366 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 
> 2.96 sec
> MapReduce Total cumulative CPU time: 2 seconds 960 msec
> Ended Job = job_1432885077153_0005
> Launching Job 2 out of 2
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
> set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
> set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
> set mapreduce.job.reduces=<number>
> Starting Job = job_1432885077153_0006, Tracking URL = 
> http://localhost:8088/proxy/application_1432885077153_0006/
> Kill Command = 
> /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop/bin/hadoop job 
> -kill job_1432885077153_0006
> Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 
> 1
> 2015-05-29 16:07:03,333 Stage-2 map = 0%, reduce = 0%
> 2015-05-29 16:07:07,485 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.2 
> sec
> 2015-05-29 16:07:15,739 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 
> 2.35 sec
> MapReduce Total cumulative CPU time: 2 seconds 350 msec
> Ended Job = job_1432885077153_0006
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.96 sec HDFS Read: 1985399 
> HDFS Write: 20068 SUCCESS
> Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 2.35 sec HDFS Read: 24481 
> HDFS Write: 6 SUCCESS
> Total MapReduce CPU Time Spent: 5 seconds 310 msec
>
> That's ok.
>
>
> r7raul1...@163.com<mailto:r7raul1...@163.com>

Re: Tez lauche container error when use UseG1GC

Reply via email to