Hi all,

Our kafka cluster on production environment have some problem, some brokers
connect to zookeeper timeout almost everyday. There are four brokers, each
has 10 core CPU and 8G memory.
The following is the server.log , it said broker can not connect to
zookeeper. I had capture packets using tcpdump , and found it was broker
side problem, broker not sent heartbeats to zookeeper, then timeout.

[2019-08-07 02:30:01,626] WARN Client session timed out, have not heard
from server in 6628ms for sessionid 0x36c2faa9d5f3cf3
(org.apache.zookeeper.ClientCnxn)
[2019-08-07 02:30:01,639] INFO Client session timed out, have not heard
from server in 6628ms for sessionid 0x36c2faa9d5f3cf3, closing socket
connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)
[2019-08-07 02:30:04,709] WARN Attempting to send response via channel for
which there is no open connection, connection id 10.97.133.17:9092
-10.97.165.60:43578-52592642 (kafka.network.Processor)
[2019-08-07 02:30:04,802] WARN Attempting to send response via channel for
which there is no open connection, connection id 10.97.133.17:9092
-10.97.200.19:58674-52592642 (kafka.network.Processor)

And the kafkaServer-gc.log:

2019-08-07T02:29:56.746+0800: 12540338.986: [GC concurrent-mark-end,
0.7387587 secs]
2019-08-07T02:29:56.749+0800: 12540338.990: [GC remark
2019-08-07T02:29:56.749+0800: 12540338.990: [Finalize Marking, 0.0080879
secs]
2019-08-07T02:29:56.758+0800: 12540338.998: [GC ref-proc, 0.0011597 secs]
2019-08-07T02:29:56.759+0800: 12540338.999: [Unloading, 4.7635856 secs],
4.7984788 secs]
[Times: user=0.00 sys=0.00, real=4.80 secs]

gc Unloading time take 4.8 secs, and I used jstack, it appeared full gc
count is zero.

JDK version : jdk1.8.0_73
kafka version: kafka_2.11-1.1.1 and running in compatible mode:
inter.broker.protocol.version=1.1
log.message.format.version=0.9.0

java running parameters:

/usr/java/jdk1.8.0_73/bin/java -XX:PermSize=128m -XX:MaxPermSize=128m

-Xms4096m -Xmx4096m -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent

-Djava.awt.headless=true
-Xloggc:/usr/local/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M


Anyone who can help me figure out the problem, thanks.

Reply via email to