Hi Guys,

Just a update on this issues. I turned on our debugger when running our
jobs. Was able to nail it down to "Connection refused" error. I am not sure
what has caused this error and how it can be prevent. Looking at the
datanodes when the job was running, I did not notice anything wrong. If
anyone has seen this issue and can shred some light on it, that would be
great. Error logs are given below. Interesting thing is it does not happen
every time. It a sporadic occurrence and if we restart our hive job again,
it goes through without any errors.

*Error Logs:*

16/05/10 16:10:14 DEBUG client.TezClientUtils: Connecting to Tez AM at
maskserver21/1.1.1.1:56372
16/05/10 16:10:15 DEBUG security.UserGroupInformation: PrivilegedAction
as:aiprod (auth:SIMPLE)
from:org.apache.tez.client.TezClientUtils.getAMProxy(TezClientUtils.java:826)
16/05/10 16:10:15 DEBUG ipc.Client: getting client out of cache:
org.apache.hadoop.ipc.Client@1c8825a5
16/05/10 16:10:40 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver1/2.2.2.2:8020 from aiprod: closed
16/05/10 16:10:40 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver1/2.2.2.2:8020 from aiprod: stopped, remaining connections 1
16/05/10 16:10:44 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver11/2.2.2.21:8050 from aiprod: closed
16/05/10 16:10:44 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver11/2.2.2.21:8050 from aiprod: stopped, remaining connections 0
16/05/10 16:21:10 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [] with
renew id 1 executed
16/05/10 16:21:10 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [] with
renew id 1 expired
16/05/10 16:21:10 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [] with
renew id 1 exited
16/05/10 16:21:11 DEBUG ipc.Client: The ping interval is 60000 ms.
16/05/10 16:21:11 DEBUG ipc.Client: Connecting to maskserver21/1.1.1.1:56372
16/05/10 16:21:12 INFO ipc.Client: Retrying connect to server: maskserver21/
1.1.1.1:56372. Already tried 0 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
16/05/10 16:21:13 INFO ipc.Client: Retrying connect to server: maskserver21/
1.1.1.1:56372. Already tried 1 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
.
.
.
16/05/10 16:22:01 INFO ipc.Client: Retrying connect to server: maskserver21/
1.1.1.1:56372. Already tried 49 time(s); retry policy is
RetryUpToMaximumCountWithFixedSleep(maxRetries=50, sleepTime=1000
MILLISECONDS)
16/05/10 16:22:01 DEBUG ipc.Client: closing ipc connection to maskserver21/
1.1.1.1:56372: Connection refused
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706)
        at
org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        at org.apache.hadoop.ipc.Client.call(Client.java:1400)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at $Proxy31.getAMStatus(Unknown Source)
        at
org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:535)
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:610)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
16/05/10 16:22:01 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver21/1.1.1.1:56372 from aiprod: closed
16/05/10 16:22:01 INFO client.TezClient: Failed to retrieve AM Status via
proxy
com.google.protobuf.ServiceException: java.net.ConnectException: Call From
maskserver2/3.3.3.3 to maskserver21:56372 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:246)
        at $Proxy31.getAMStatus(Unknown Source)
        at
org.apache.tez.client.TezClient.getAppMasterStatus(TezClient.java:535)
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:610)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.net.ConnectException: Call From maskserver2/3.3.3.3 to
maskserver21:56372 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
        at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1473)
        at org.apache.hadoop.ipc.Client.call(Client.java:1400)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        ... 14 more
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:608)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:706)
        at
org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:369)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1522)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        ... 16 more
16/05/10 16:22:02 DEBUG ipc.Client: The ping interval is 60000 ms.
16/05/10 16:22:02 DEBUG ipc.Client: Connecting to maskserver11/2.2.2.21:8050
16/05/10 16:22:02 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver11/2.2.2.21:8050 from aiprod: starting, having connections 1
16/05/10 16:22:02 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver11/2.2.2.21:8050 from aiprod sending #70
16/05/10 16:22:02 DEBUG ipc.Client: IPC Client (1495065854) connection to
maskserver11/2.2.2.21:8050 from aiprod got value #70
16/05/10 16:22:02 DEBUG ipc.ProtobufRpcEngine: Call: getApplicationReport
took 3ms
Exception in thread "main" java.lang.RuntimeException:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown.
No cluster diagnostics found.
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already
shutdown. No cluster diagnostics found.
        at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:612)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)
        at
org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)
        at
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)
        ... 8 more


Regards
Khaja Hussain

On Sat, Apr 9, 2016 at 5:13 PM, Khaja Hussain <khaja....@gmail.com> wrote:

> Hi Guys -
>
> I am getting this error messages "java.lang.RuntimeException:
> org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown.
> No cluster diagnostics found" for some of the jobs I am running. The
> cluster are heavily loaded. I am suspecting this to be some kind of
> time-out issue. Any pointer or tez setting that needs to be looked into
> would be helpful. From the log I can infer even the application has not
> started.
>
>
> *Logs:*
>
> *16/04/09 14:15:11 WARN conf.HiveConf: HiveConf of name
> hive.semantic.analyzer.factory.impl does not exist*
>
> *Logging initialized using configuration in
> file:/etc/hive/conf/hive-log4j.properties*
> *SLF4J: Class path contains multiple SLF4J bindings.*
> *SLF4J: Found binding in
> [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]*
> *SLF4J: Found binding in
> [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]*
> *SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
> <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.*
> *SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]*
> *Exception in thread "main" java.lang.RuntimeException:
> org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown.
> No cluster diagnostics found.*
> *        at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457)*
> *        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)*
> *        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)*
> *        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)*
> *        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)*
> *        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)*
> *        at java.lang.reflect.Method.invoke(Method.java:597)*
> *        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)*
> *        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)*
> *Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has
> already shutdown. No cluster diagnostics found.*
> *        at
> org.apache.tez.client.TezClient.waitTillReady(TezClient.java:612)*
> *        at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)*
> *        at
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)*
> *        at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)*
> *        ... 8 more*
>
>
> Regards
> Khaja Hussain
>

Reply via email to