You can also look into the producer time metrics and see if the time goes
up before the spike of the request rate. If the time does go up, you can
look at the breakdown of the time. Any latency due to local I/O will be
included in the local time.

Thanks,

Jun

On Tue, Nov 11, 2014 at 10:50 AM, Wes Chow <w...@chartbeat.com> wrote:

>
> We're seeing periodic spikes in req/sec rates across our nodes. Our
> cluster is 10 nodes, and the topic has a replication factor of 3. We push
> around 200k messages / sec into Kafka.
>
>
> The machines are running the most recent version of Kafka and we're
> connecting via librdkafka. pingstream02-10 are using the CMS garbage
> collector, but I switched pingstream01 to use G1GC under the theory that
> maybe these were GC pauses. The graph shows that likely didn't improve the
> situation.
>
> My next thought is that maybe this is the effect of log rolling. Checking
> in the logs, I see a lot of this:
>
> [2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-342' in 3 ms.
> [2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-186' in 2 ms.
> [2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-253' in 3 ms.
> [2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-209' in 3 ms.
> [2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-299' in 2 ms.
> [2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-506' in 2 ms.
>
> The "pings" topic in question has 512 partitions, so it does this 512
> times every so often. We have an effective retention period of a bit less
> than 30 min, so rolling happens pretty frequently. Still, if I assume worst
> case that rolling locks up the process for 2ms and there are 512 rolls
> every few minutes, I'd expect halting to happen for about a second at a
> time. The graphs seem to indicate much longer dips, but it's hard for me to
> know if I'm looking at real data or some sort of artifact.
>
> Fwiw, the producers are not reporting any errors, so it does not seem like
> we're losing data.
>
> I'm new to Kafka. Should I be worried? If so, how should I be debugging
> this?
>
> Thanks,
> Wes
>
>

Reply via email to