Yeah, this is an annoying combination of the ambari-agent and Thrift. Good on you to get to the bottom of it, and thanks for commenting back to the list!

The agent opens a socket to the Thrift service to do the Ambari "port check", but sends no data (as it's just checking network connectivity). If the server accepts the connection, the Agent just hangs up, assuming everything is good. However, this triggers an error in Thrift instead of just proceeding.

IMO, thrift shouldn't log this as an error, but it's what we have :)

On 12/8/19 12:12 PM, James Srinivasan wrote:
Ahh, looks like it was the ambari-agent process, part of HDP. Since
that runs on the same machine, it wasn't in my tcpdump.

Second time ambari-agent has done something unexpected for me! (first
was renewing a keytab behind my back)

On Sun, 8 Dec 2019 at 12:39, James Srinivasan
<[email protected]> wrote:

I'm running Accumulo 1.7 (HDP3) on a Kerberized cluster. When trying
to debug some client libthrift issues, I noticed errors like this
every minute (pretty much on the minute) in my tserver logs:

2019-12-08 12:35:01,255 [server.TThreadPoolServer] ERROR: Error
occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
         at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
         at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:51)
         at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory$1.run(UGIAssumingTransportFactory.java:48)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:360)
         at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
         at 
org.apache.accumulo.core.rpc.UGIAssumingTransportFactory.getTransport(UGIAssumingTransportFactory.java:48)
         at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:208)
         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
         at 
org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
         at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException
         at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
         at 
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
         at 
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
         at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
         at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
         at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
         ... 11 more

I get them even if there are no clients running and the tserver is the
only accumulo process running in the cluster (no master, no other
tservers etc.) and curiously I don't see any network traffic on port
9997. Any idea how to debug further?

Many thanks,

James

Reply via email to