Hi all,

We are seeing our workers constantly being killed by Storm with to the 
following logs:
worker: 2014-05-23 20:15:08 INFO ClientCxn:1157 - Client session timed out, 
have not heard from the server in 28105ms for sessionid 0x14619bf2f4e0109, 
closing socket and attempting reconnect
supervisor: 2014-05-23 20:17:30 INFO supervisor:0 - Shutting down and clearing 
state for id 94349373-74ec-484b-a9f8-a5076e17d474. Current supervisor time: 
1400876250. State: :disallowed, Heartbeat: 
#backtype.storm.daemon.common.WorkerHeartbeat{{:time-secs 1400876249, :storm-id 
"test-46-1400863199", :executors #{[-1 -1]}, :port 6700}

Eventually Storm decides to just kill the worker and restart it as you see in 
the supervisor log. We theorize this is the Zookeeper heartbeat thread and it 
is being choked out due to very high CPU load on the machine (near 100%).

I have increased the connection timeouts in the storm.yaml config file yet 
Storm seems to continue to use some unknown value for the above client session 
timeout messages:
storm.zookeeper.connection.timeout: 300000
storm.zookeeper.session.timeout: 300000

1) What timeout config is appropriate for the above timeout  message?
2) Is this expected behavior for Storm to be unable to keep up with heartbeat 
threads under high CPU or is our theory incorrect?

Thanks,
Michael
                                          

Reply via email to