Another couple of things I found:

*A couple of Zookeeper client threads are stuck at these stacktraces for
~30 seconds*
"pool-2-thread-1-EventThread" #1218 daemon prio=5 os_prio=0
tid=0x00007ff3f5e23800 nid=0x5cd8 waiting on condition [0x00007ff3ef803000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x000000018d6d8ed8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:504)

"pool-2-thread-1-SendThread(72.55.136.25:2181)" #1217 daemon prio=5
os_prio=0 tid=0x00007ff3f5e23000 nid=0x5cd7 runnable [0x00007ff3ef904000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x000000018d68a730> (a sun.nio.ch.Util$3)
    - locked <0x000000018d68a720> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000018d68a258> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1145)

*Running the mntr command returned the following stats*
zk_version    3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on
06/29/2018 04:05 GMT
zk_avg_latency    0
zk_max_latency    *17657*
zk_min_latency    0
zk_packets_received    1427134
zk_packets_sent    1596974
zk_num_alive_connections    64
zk_outstanding_requests    0
zk_server_state    follower
zk_znode_count    1394
zk_watch_count    592
zk_ephemerals_count    192
zk_approximate_data_size    181257
zk_open_file_descriptor_count    94
zk_max_file_descriptor_count    1048576
zk_fsync_threshold_exceed_count    1

I find the *zk_max_latency* extremely hight. I'm wondering what kind of
latency is that ? How can I debug the reason for this value ?

Thanks,
Gelbana



On Wed, Apr 3, 2019 at 1:42 PM Muhammad Gelbana <[email protected]> wrote:

> I'm trying to debug a problem where our client application suddenly loses
> its Zookeeper session. I concluded that by looking at the Zookeeper server
> logs.
>
> I increased the logging details for the client and found the following log
> messages
>
>> DEBUG: [07:33:33] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>> DEBUG: [07:34:07] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>> DEBUG: [07:34:40] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>> DEBUG: [07:35:13] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>> DEBUG: [07:35:47] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>> DEBUG: [07:36:20] [demo | HA | Manager] Got ping response for sessionid:
>> 0x3000da76fa904b6 after 0ms
>> [org.apache.zookeeper.ClientCnxn$SendThread.readResponse]
>>
>
> I noticed that the duration between each log message is ~33 seconds while
> on another environment (my laptop), the duration goes down to ~1 second.
> What could be causing this huge difference ? I doubt that whatever is
> causing this effect causes the delay to increase significantly at some
> point to the extend that makes my client lose its session.
>

Reply via email to