That is an excellent question!  There are a bunch of ways to monitor
jitter and see when that is happening.  Here are a few:

- You could slice the histogram every few seconds, save it out with a
timestamp, and then look at how they compare.  This would be mostly
manual, or you can graph line charts of the percentiles over time in excel
where each percentile would be a series.  If you are using HDR Histogram,
you should look at how to use the Recorder class to do this coupled with a
ScheduledExecutorService.

- You can just save the starting timestamp of the event and the latency of
each event.  If you put it into a CSV, you can just load it up into excel
and graph as a XY chart.  That way you can see every point during the
running of your program and you can see trends.  You want to be careful
about this one, especially of writing to a file in the callback that kfaka
provides.  

Also, I have noticed that most of the very slow observations are at
startup.  But don’t trust me, trust the data and share your findings.
Also, having a 99.9 percentile provides a pretty good standard for typical
poor case performance.  Average is borderline useless, 50%’ile is a better
typical case because that’s the number that says “half of events will be
this slow or faster”, or for values that are high like 99.9%’ile, “0.1% of
all events will be slower than this”.
-Erik 

On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:

>Thank you Erik! That's is helpful!
>
>But also I see jitters of the maximum latencies when running the
>experiment.
>
>The average end to acknowledgement latency from producer to broker is
>around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile
>latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate
>the source of this jitter?
>
>Thanks.
>
>On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik
><erik.helle...@cmegroup.com>
>wrote:
>
>> WellŠ not to be contrarian, but latency depends much more on the latency
>> between the producer and the broker that is the leader for the partition
>> you are publishing to.  At least when your brokers are not saturated
>>with
>> messages, and acks are set to 1.  If acks are set to ALL, latency on an
>> non-saturated kafka cluster will be: Round Trip Latency from producer to
>> leader for partition + Max( slowest Round Trip Latency to a replicas of
>> that partition).  If a cluster is saturated with messages, we have to
>> assume that all partitions receive an equal distribution of messages to
>> avoid linear algebra and queueing theory models.  I don¹t like linear
>> algebra :P
>>
>> Since you are probably putting all your latencies into a single
>>histogram
>> per producer, or worse, just an average, this pattern would have been
>> obscured.  Obligatory lecture about measuring latency by Gil Tene
>> (https://www.youtube.com/watch?v=9MKY4KypBzg).  To verify this
>>hypothesis,
>> you should re-write the benchmark to plot the latencies for each write
>>to
>> a partition for each producer into a histogram. (HRD histogram is pretty
>> good for that).  This would give you producers*partitions histograms,
>> which might be unwieldy for that many producers. But wait, there is
>>hope!
>>
>> To verify that this hypothesis holds, you just have to see that there
>>is a
>> significant difference between different partitions on a SINGLE
>>producing
>> client. So, pick one producing client at random and use the data from
>> that. The easy way to do that is just plot all the partition latency
>> histograms on top of each other in the same plot, that way you have a
>> pretty plot to show people.  If you don¹t want to setup plotting, you
>>can
>> just compare the medians (50¹th percentile) of the partitions¹
>>histograms.
>>  If there is a lot of variance, your latency anomaly is explained by
>> brokers 4-7 being slower than nodes 0-3!  If there isn¹t a lot of
>>variance
>> at 50%, look at higher percentiles.  And if higher percentiles for all
>>the
>> partitions look the same, this hypothesis is disproved.
>>
>> If you want to make a general statement about latency of writing to
>>kafka,
>> you can merge all the histograms into a single histogram and plot that.
>>
>> To Yuheng¹s credit, more brokers always results in more throughput. But
>> throughput and latency are two different creatures.  Its worth noting
>>that
>> kafka is designed to be high throughput first and low latency second.
>>And
>> it does a really good job at both.
>>
>> Disclaimer: I might not like linear algebra, but I do like statistics.
>> Let me know if there are topics that need more explanation above that
>> aren¹t covered by Gil¹s lecture.
>> -Erik
>>
>> On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>>
>> >When I using 32 partitions, the 4 brokers latency becomes larger than
>>the
>> >8
>> >brokers latency.
>> >
>> >So is it always true that using more brokers can give less latency when
>> >the
>> >number of partitions is at least the size of the brokers?
>> >
>> >Thanks.
>> >
>> >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.h...@gmail.com>
>> >wrote:
>> >
>> >> I am running a producer latency test. When using 92 producers in 92
>> >> physical node publishing to 4 brokers, the latency is slightly lower
>> >>than
>> >> using 8 brokers, I am using 8 partitions for the topic.
>> >>
>> >> I have rerun the test and it gives me the same result, the 4 brokers
>> >> scenario still has lower latency than the 8 brokers scenarios.
>> >>
>> >> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8
>>brokers,
>> >>16
>> >> brokers and 32 brokers. For the rest of the case the latency
>>decreases
>> >>as
>> >> the number of brokers increase.
>> >>
>> >> 4 brokers/8 brokers is the only pair that doesn't satisfy this rule.
>> >>What
>> >> could be the cause?
>> >>
>> >> I am using a 200 bytes message, the test let each producer publishes
>> >>500k
>> >> messages to a given topic. Every test run when I change the number of
>> >> brokers, I use a new topic.
>> >>
>> >> Thanks for any advices.
>> >>
>>
>>

Reply via email to