One thing you might try is to update the consensus rpc timeout to 30
seconds instead of 1. We changed the default in later versions.

I'd also recommend updating up 1.4 or 1.5 for other related fixes to
consensus stability. I think I recall you were on 1.3 still?

Todd


On Nov 3, 2017 7:47 PM, "Lee King" <yuyunliu...@gmail.com> wrote:

Hi,
    Our kudu cluster have ran well a long time,  but write became slowly
recently,client also come out rpc timeout. I check the warning and find
vast error look this:
W1104 10:25:16.833736 10271 consensus_peers.cc:365] T
149ffa58ac274c9ba8385ccfdc01ea14 P 59c768eb799243678ee7fa3f83801316 -> Peer
1c67a7e7ff8f4de494469766641fccd1 (cloud-sk-ds-08:7050): Couldn't send
request to peer 1c67a7e7ff8f4de494469766641fccd1 for tablet
149ffa58ac274c9ba8385ccfdc01ea14. Status: Timed out: UpdateConsensus RPC to
10.6.60.9:7050 timed out after 1.000s (SENT). Retrying in the next
heartbeat period. Already tried 5 times.
    I change the configure
rpc_service_queue_length=400,rpc_num_service_threads=40,
but it takes no effect.
    Our cluster include 5 master , 10 ts. 3800G data, 800 tablet per ts. I
check one of the ts machine's memory, 14G left(128 In all), thread 4739(max
32000), openfile 28000(max 65536), cpu disk utilization ratio about 30%(32
core), disk util  less than 30%.
    Any suggestion for this? Thanks!

Reply via email to