Yeah, this is an annoying combination of the ambari-agent and Thrift.
Good on you to get to the bottom of it, and thanks for commenting back
to the list!
The agent opens a socket to the Thrift service to do the Ambari "port
check", but sends no data (as it's just checking network connectivity).
If the server accepts the connection, the Agent just hangs up, assuming
everything is good. However, this triggers an error in Thrift instead of
just proceeding.
IMO, thrift shouldn't log this as an error, but it's what we have :)
On 12/8/19 12:12 PM, James Srinivasan wrote:
Ahh, looks like it was the ambari-agent process, part of HDP. Since
that runs on the same machine, it wasn't in my tcpdump.
Second time ambari-agent has done something unexpected for me! (first
was renewing a keytab behind my back)
On Sun, 8 Dec 2019 at 12:39, James Srinivasan
<[email protected]> wrote:
I'm running Accumulo 1.7 (HDP3) on a Kerberized cluster. When trying
to debug some client libthrift issues, I noticed errors like this
every minute (pretty much on the minute) in my tserver logs:
2019-12-08 12:35:01,255 [server.TThreadPoolServer] ERROR: Error
occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
at
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
at
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
at
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 11 more
I get them even if there are no clients running and the tserver is the
only accumulo process running in the cluster (no master, no other
tservers etc.) and curiously I don't see any network traffic on port
9997. Any idea how to debug further?
Many thanks,
James