When you said "The only difference we could see is that thread usage
decreases during these period", did you mean thread usage increases?
You can monitor the usage of two different thread pools, network
thread pool and requestHandler thread pool. If none of them are high
and yet, you have a large C
There can be many reasons for the lag. As an example:
1. This specific partition is getting 10x incoming traffic / produce
requests compared to other partitions.
2. This partition is hosted / leadership on a node which is very busy,
so all write requests to this partition have high latency.
3. Cons