Re: Yarn AM is abending job when submitting a remote job to cluster

roland.depratti Thu, 19 Feb 2015 06:15:51 -0800

Alex,

That sounds like a very likely situation.


I read in the first jira that tokens are now used in nonsecure setups, which 
explains my earlier ssl question.

Is the solution simply to delete those staging files from the cluster?

- rd 


Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Alexander Alten-Lorenz <wget.n...@gmail.com> 
Date:02/19/2015  7:43 AM  (GMT-05:00) 
To: user@hadoop.apache.org 
Subject: Re: Yarn AM is abending job when submitting a remote job to cluster 

Hi,

https://issues.apache.org/jira/browse/YARN-1116 
<https://issues.apache.org/jira/browse/YARN-1058>

Looks like that the history server received a unclean shutdown or an previous 
job doesn’t finished, or wasn’t cleaned up after finishing the job (2015-02-15 
07:51:07,241 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: 
YARN_AM_RM_TOKEN, Service: , Ident: 
(org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 
<mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>) …. 
Previous history file is at 
hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
 
<http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>).

BR,
Alex


> On 19 Feb 2015, at 13:27, Roland DePratti <roland.depra...@cox.net> wrote:
> 
> Daemeon,
>  
> Thanks for the reply.  I have about 6 months exposure to Hadoop and new to 
> SSL so I did some digging after reading your message.
>  
> In the HDFS config, I have hadoop.ssl.enabled. using the default which is 
> ‘false’  (which I understand sets it for all Hadoop daemons).
>  
> I assumed this meant that it is not in use and not a factor in job submission 
> (ssl certs not needed).
>  
> Do I misunderstand and are you saying that it needs to be set to ‘true’ with 
> valid certs and store setup for me to submit a remote job (this is a POC 
> setup without exposure to outside my environment)?
>  
> -  rd
>  
> From: daemeon reiydelle [mailto:daeme...@gmail.com] 
> Sent: Wednesday, February 18, 2015 10:22 PM
> To: user@hadoop.apache.org
> Subject: Re: Yarn AM is abending job when submitting a remote job to cluster
>  
> I would guess you do not have your ssl certs set up, client or server, based 
> on the error. 
> 
> 
> .......
> “Life should not be a journey to the grave with the intention of arriving 
> safely in a
> pretty and well preserved body, but rather to skid in broadside in a cloud of 
> smoke,
> thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a 
> Ride!” 
> - Hunter Thompson
> 
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198
> London (+44) (0) 20 8144 9872
>  
> On Wed, Feb 18, 2015 at 5:19 PM, Roland DePratti <roland.depra...@cox.net 
> <mailto:roland.depra...@cox.net>> wrote:
> I have been searching for a handle on a problem without very little clues. 
> Any help pointing me to the right direction will be huge.
> I have not received any input form the Cloudera google groups. Perhaps this 
> is more Yarn based and I am hoping I have more luck here.
> Any help is greatly appreciated.
>  
> I am running a Hadoop cluster using CDH5.3. I also have a client machine with 
> a standalone one node setup (VM).
>  
> All environments are running CentOS 6.6.
>  
> I have submitted some Java mapreduce jobs locally on both the cluster and the 
> standalone environment with successfully completions.   
>  
> I can submit a remote HDFS job from client to cluster using -conf 
> hadoop-cluster.xml (see below) and get data back from the cluster with no 
> problem.
> 
> When submitted remotely the mapreduce jobs remotely, I get an AM error:
>  
> AM fails the job with the error: 
> 
>            SecretManager$InvalidToken: appattempt_1424003606313_0001_000002 
> not found in AMRMTokenSecretManager
> 
> I searched /var/log/secure on the client and cluster with no unusual messages.
> 
> Here is the contents of hadoop-cluster.xml:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> <!--generated by Roland-->
> <configuration>
>   <property>
>     <name>fs.defaultFS</name>
>     <value>hdfs://mycluser:8020</value>
>   </property>
>   <property>
>     <name>mapreduce.jobtracker.address</name>
>     <value>hdfs://mycluster:8032</value>
>   </property>
>   <property>
>     <name>yarn.resourcemanager.address</name>
>     <value>hdfs://mycluster:8032</value>
>   </property>
> 
> Here is the output from the job log on the cluster:  
> 
> 2015-02-15 07:51:06,544 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1424003606313_0001_000002
> 2015-02-15 07:51:06,949 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> hadoop.ssl.require.client.cert;  Ignoring.
> 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2015-02-15 07:51:06,952 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  
> Ignoring.
> 2015-02-15 07:51:06,954 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> hadoop.ssl.keystores.factory.class;  Ignoring.
> 2015-02-15 07:51:06,957 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> Ignoring.
> 2015-02-15 07:51:06,973 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2015-02-15 07:51:07,241 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
> 2015-02-15 07:51:07,241 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, 
> Service: , Ident: 
> (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0 
> <mailto:org.apache.hadoop.yarn.security.AMRMTokenIdentifier@33be1aa0>)
> 2015-02-15 07:51:07,332 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
> 2015-02-15 07:51:07,627 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> hadoop.ssl.require.client.cert;  Ignoring.
> 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2015-02-15 07:51:07,632 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.client.conf;  
> Ignoring.
> 2015-02-15 07:51:07,639 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> hadoop.ssl.keystores.factory.class;  Ignoring.
> 2015-02-15 07:51:07,645 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: hadoop.ssl.server.conf;  
> Ignoring.
> 2015-02-15 07:51:07,663 WARN [main] org.apache.hadoop.conf.Configuration: 
> job.xml:an attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2015-02-15 07:51:08,237 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2015-02-15 07:51:08,429 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config 
> null
> 2015-02-15 07:51:08,499 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter is 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
> 2015-02-15 07:51:08,526 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler
> 2015-02-15 07:51:08,527 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher
> 2015-02-15 07:51:08,561 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskEventDispatcher
> 2015-02-15 07:51:08,562 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher
> 2015-02-15 07:51:08,566 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2015-02-15 07:51:08,568 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.speculate.Speculator$EventType for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$SpeculatorEventDispatcher
> 2015-02-15 07:51:08,568 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.rm.ContainerAllocator$EventType for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
> 2015-02-15 07:51:08,570 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncher$EventType for 
> class org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
> 2015-02-15 07:51:08,599 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Recovery is enabled. Will try 
> to recover from previous life on best effort basis.
> 2015-02-15 07:51:08,642 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
> hdfs://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist
>  
> <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
> 2015-02-15 
> <http://mycluster.com:8020/user/cloudera/.staging/job_1424003606313_0001/job_1424003606313_0001_1.jhist2015-02-15>
>  07:51:09,147 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: 
> Read completed tasks from history 0
> 2015-02-15 07:51:09,193 INFO [main] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler
> 2015-02-15 07:51:09,222 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from 
> hadoop-metrics2.properties
> 2015-02-15 07:51:09,277 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
> at 10 second(s).
> 2015-02-15 07:51:09,277 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MRAppMaster metrics system 
> started
> 2015-02-15 07:51:09,286 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Adding job token for 
> job_1424003606313_0001 to jobTokenSecretManager
> 2015-02-15 07:51:09,306 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Not uberizing 
> job_1424003606313_0001 because: not enabled; too much RAM;
> 2015-02-15 07:51:09,324 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Input size for job 
> job_1424003606313_0001 = 5343207. Number of splits = 5
> 2015-02-15 07:51:09,325 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Number of reduces for 
> job job_1424003606313_0001 = 1
> 2015-02-15 07:51:09,325 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1424003606313_0001Job Transitioned from NEW to INITED
> 2015-02-15 07:51:09,327 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster launching normal, 
> non-uberized, multi-container job job_1424003606313_0001.
> 2015-02-15 07:51:09,387 INFO [main]

Re: Yarn AM is abending job when submitting a remote job to cluster

Reply via email to