My experience with slow fsyncs is that it's almost always due to contention
for disk IO. I see that you tuned the snap* sizes down, which is
reasonable. You might check what ZK activity is happening during this
period? Perhaps some client is hammering the cluster, have you ruled
that out?

I searched the mail archives, there are other folks reporting this issue,
you might take a look. I found this one in particular that you might
checkout:
https://lists.apache.org/thread/qjrlprmt7pdy63ztvjtvkd0f5zgw5dgk

Patrick

On Thu, Apr 18, 2024 at 3:31 AM Xu Bill <xuzhili1...@hotmail.com> wrote:

> Hello,
>
> I have a pretty weird issue of ZooKeeper.
> Everyday around 17:30, my ZooKeeper throws a warning message in log says
> "fsync-ing the write ahead log in SyncThread:0 took 36919ms which will
> adversely effect operation latency.File size is 16777232 bytes.". And this
> causes my clients connected to ZooKeeper being timed out. I have to restart
> my clients every day.
>
> Though I don't think the size of the txn log file is too big to be handled
> quickly,
> still I tried to change parameters to supress the size of txn log. Below
> is my configuration.
> preAllocSize=16M
> snapCount=30000
> snapSizeLimitInKb=32M
>
> Even with this configuration, I still got the warnings.
>
> I also tried to monitor the IO stats on data disk which the data dir of
> ZooKeeper is in.
> But the stats were as the same as usual.
>
> Can anybody help give suggestions on how to solve or investigate on this
> issue?
> I am using ZooKeeper 3.7.2.
> The IO stats were tps=122, reading=20.1k/s, writing=2M/s, when the warning
> was happening.
>
> Best regards,
> Bill
>

Reply via email to