[jira] [Updated] (YARN-10264) Add container launch related env / classpath debug info to container logs when a container fails

2021-01-08 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10264:
--
Parent: YARN-10323
Issue Type: Sub-task  (was: Task)

> Add container launch related env / classpath debug info to container logs 
> when a container fails
> 
>
> Key: YARN-10264
> URL: https://issues.apache.org/jira/browse/YARN-10264
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
>
> Sometimes when a container fails to launch, it can be pretty hard to figure 
> out why it has failed.
> Similar to YARN-4309, we can add a switch to control if the printing of 
> environment variables and Java classpath should be done.
> As a bonus, 
> [jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html] 
> could also be utilized to print some verbose info about the classpath. 
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such container launch failures much easier.
> Below is an example output when the user faces a classpath configuration 
> issue while launching an application: 
> {code:java}
> End of LogType:prelaunch.err
> **
> 2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app
> 2020-04-19 05:49:12,145 DEBUG:app_info:Application 
> application_1587300264561_0001 failed 2 times due to AM Container for 
> appattempt_1587300264561_0001_02 exited with  exitCode: 1
> Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from 
> container-launch.
> Container id: container_e60_1587300264561_0001_02_01
> Exit code: 1
> Exception message: Launch container failed
> Shell output: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/container_e60_1587300264561_0001_02_01/container_e60_1587300264561_0001_02_01.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> [2020-04-19 12:45:01.984]Container exited with a non-zero exit code 1. Error 
> file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> Please check whether your etc/hadoop/mapred-site.xml contains the below 
> configuration:
> 
>   yarn.app.mapreduce.am.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> 
>   mapreduce.map.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> 
>   mapreduce.reduce.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> [2020-04-19 12:45:01.985]Container exited with a non-zero exit code 1. Error 
> file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> Please check whether your etc/hadoop/mapred-site.xml contains the below 
> configuration:
> 
>   yarn.app.mapreduce.am.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> 
>   mapreduce.map.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> 
>   mapreduce.reduce.env
>   HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
> directory}
> 
> For more detailed output, check the application tracking page: 
> http://quasar-plnefj-2.quasar-plnefj.root.hwx.site:8088/cluster/app/application_1587300264561_0001
>  Then click on links to logs of each attempt.
> ...
> 2020-04-19 05:49:12,148 INFO:util:* End test_app_API 
> (yarn.suite.YarnAPITests) *
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10264) Add container launch related env / classpath debug info to container logs when a container fails

2020-05-13 Thread Szilard Nemeth (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-10264:
--
Description: 
Sometimes when a container fails to launch, it can be pretty hard to figure out 
why it has failed.

Similar to YARN-4309, we can add a switch to control if the printing of 
environment variables and Java classpath should be done.
As a bonus, 
[jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html] 
could also be utilized to print some verbose info about the classpath. 

When log aggregation occurs, all this information will automatically get 
collected and make debugging such container launch failures much easier.

Below is an example output when the user faces a classpath configuration issue 
while launching an application: 

{code:java}
End of LogType:prelaunch.err
**
2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app
2020-04-19 05:49:12,145 DEBUG:app_info:Application 
application_1587300264561_0001 failed 2 times due to AM Container for 
appattempt_1587300264561_0001_02 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from 
container-launch.
Container id: container_e60_1587300264561_0001_02_01
Exit code: 1
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is systest
main : requested yarn user is systest
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/container_e60_1587300264561_0001_02_01/container_e60_1587300264561_0001_02_01.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...


[2020-04-19 12:45:01.984]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below 
configuration:

  yarn.app.mapreduce.am.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


  mapreduce.map.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


  mapreduce.reduce.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


[2020-04-19 12:45:01.985]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below 
configuration:

  yarn.app.mapreduce.am.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


  mapreduce.map.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


  mapreduce.reduce.env
  HADOOP_MAPRED_HOME=${full path of your hadoop distribution 
directory}


For more detailed output, check the application tracking page: 
http://quasar-plnefj-2.quasar-plnefj.root.hwx.site:8088/cluster/app/application_1587300264561_0001
 Then click on links to logs of each attempt.
...
2020-04-19 05:49:12,148 INFO:util:* End test_app_API (yarn.suite.YarnAPITests) *
{code}


  was:
Sometimes when a container fails to launch, it can be pretty hard to figure out 
why it failed.

Similar to YARN-4309, we can add a switch to control if the printing of 
environment variables and Java classpath should be done.
As a bonus, 
[jdeps|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jdeps.html] 
could also be utilized to print some verbose info about the classpath. 

When log aggregation occurs, all this information will automatically get 
collected and make debugging such container launch failures much easier.

Below is an example output when the user faces a classpath configuration issue: 

{code:java}
End of LogType:prelaunch.err
**
2020-04-19 05:49:12,145 DEBUG:app_info:Diagnostics of the failed app
2020-04-19 05:49:12,145 DEBUG:app_info:Application 
application_1587300264561_0001 failed 2 times due to AM Container for 
appattempt_1587300264561_0001_02 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2020-04-19 12:45:01.955]Exception from 
container-launch.
Container id: container_e60_1587300264561_0001_02_01
Exit code: 1
Exception message: Launch container failed
Shell output: main : command provided 1
main : run as user is systest
main : requested yarn user is systest
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file 
/dataroot/ycloud/yarn/nm/nmPrivate/application_1587300264561_0001/c