Hi Aihua,

Could you give more clarity on when job is submitted like a) before
starting upgrade b) after RM upgrade and before NM upgrade c) after YARN
upgrade fully?
Typically, order of upgrade suggested is NM's first and RM second.

Reg the NM warn messages you might be hitting
https://issues.apache.org/jira/browse/HADOOP-11692.

Doesn't any subsequent jobs succeeded post upgrade?
-Rohith Sharma K S

On Thu, 7 Feb 2019 at 03:20, Aihua Xu <aihu...@uber.com.invalid> wrote:

> Hi all,
>
> I'm investigating the rolling upgrade process from Hadoop 2.6 to Hadoop
> 2.9.1. I'm trying to upgrade ResourceManager first and then try to upgrade
> NodeManager. When I submit a yarn job, RM fails with the following
> exception:
>
>  Application application_1549408943468_0001 failed 2 times due to Error 
> launching appattempt_1549408943468_0001_000002. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.io.EOFException; Host Details : local host is: 
> "hadoopbenchaqjm01-sjc1/10.67.2.171"; destination host is: 
> "hadoopbencha22-sjc1.prod.uber.internal":8041;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:805)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1497)
> at org.apache.hadoop.ipc.Client.call(Client.java:1439)
> at org.apache.hadoop.ipc.Client.call(Client.java:1349)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy88.startContainers(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:307)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.io.EOFException
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:757)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:720)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:813)
> at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:411)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1554)
> at org.apache.hadoop.ipc.Client.call(Client.java:1385)
> ... 20 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1798)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:365)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:615)
> at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:411)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:800)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:796)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1889)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:795)
> ... 23 more
>
>
> and NM with
>
> 2019-02-06 00:29:20,214 WARN SecurityLogger.org.apache.hadoop.ipc.Server: 
> Auth failed for 10.67.2.171:54588:null (DIGEST-MD5: IO error acquiring 
> password) with true cause: (null)
>
>
> I'm wondering if it's a known issue and anybody has an insight for it.
>
> Thanks,
> Aihua
>
>
>

Reply via email to