No problem. Thanks for your advice. I think it would be fun to explore. I only know how to program in java though. Hope it will work.
On Fri, Sep 4, 2015 at 2:03 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > I thing the suggestion is to have partitions/brokers >=1, so 32 should be > enough. > > As for latency tests, there isn’t a lot of code to do a latency test. If > you just want to measure ack time its around 100 lines. I will try to > push out some good latency testing code to github, but my company is > scared of open sourcing code… so it might be a while… > -Erik > > > On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > > >Thanks for your reply Erik. I am running some more tests according to your > >suggestions now and I will share with my results here. Is it necessary to > >use a fixed number of partitions (32 partitions maybe) for my test? > > > >I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running > >on individual physical nodes. So I think using at least 32 partitions will > >make more sense? I have seen latencies increase as the number of > >partitions > >goes up in my experiments. > > > >To get the latency of each event data recorded, are you suggesting that I > >rewrite my own test program (in Java perhaps) or I can just modify the > >standard test program provided by kafka ( > >https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to > >rebuild the source if I modify the standard java test program > >ProducerPerformance provided in kafka, right? Now this standard program > >only has average latencies and percentile latencies but no per event > >latencies. > > > >Thanks. > > > >On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik > ><erik.helle...@cmegroup.com> > >wrote: > > > >> That is an excellent question! There are a bunch of ways to monitor > >> jitter and see when that is happening. Here are a few: > >> > >> - You could slice the histogram every few seconds, save it out with a > >> timestamp, and then look at how they compare. This would be mostly > >> manual, or you can graph line charts of the percentiles over time in > >>excel > >> where each percentile would be a series. If you are using HDR > >>Histogram, > >> you should look at how to use the Recorder class to do this coupled > >>with a > >> ScheduledExecutorService. > >> > >> - You can just save the starting timestamp of the event and the latency > >>of > >> each event. If you put it into a CSV, you can just load it up into > >>excel > >> and graph as a XY chart. That way you can see every point during the > >> running of your program and you can see trends. You want to be careful > >> about this one, especially of writing to a file in the callback that > >>kfaka > >> provides. > >> > >> Also, I have noticed that most of the very slow observations are at > >> startup. But don’t trust me, trust the data and share your findings. > >> Also, having a 99.9 percentile provides a pretty good standard for > >>typical > >> poor case performance. Average is borderline useless, 50%’ile is a > >>better > >> typical case because that’s the number that says “half of events will be > >> this slow or faster”, or for values that are high like 99.9%’ile, “0.1% > >>of > >> all events will be slower than this”. > >> -Erik > >> > >> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> > >> >Thank you Erik! That's is helpful! > >> > > >> >But also I see jitters of the maximum latencies when running the > >> >experiment. > >> > > >> >The average end to acknowledgement latency from producer to broker is > >> >around 5ms when using 92 producers and 4 brokers, and the 99.9 > >>percentile > >> >latency is 58ms, but the maximum latency goes up to 1359 ms. How to > >>locate > >> >the source of this jitter? > >> > > >> >Thanks. > >> > > >> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik > >> ><erik.helle...@cmegroup.com> > >> >wrote: > >> > > >> >> WellŠ not to be contrarian, but latency depends much more on the > >>latency > >> >> between the producer and the broker that is the leader for the > >>partition > >> >> you are publishing to. At least when your brokers are not saturated > >> >>with > >> >> messages, and acks are set to 1. If acks are set to ALL, latency on > >>an > >> >> non-saturated kafka cluster will be: Round Trip Latency from > >>producer to > >> >> leader for partition + Max( slowest Round Trip Latency to a replicas > >>of > >> >> that partition). If a cluster is saturated with messages, we have to > >> >> assume that all partitions receive an equal distribution of messages > >>to > >> >> avoid linear algebra and queueing theory models. I don¹t like linear > >> >> algebra :P > >> >> > >> >> Since you are probably putting all your latencies into a single > >> >>histogram > >> >> per producer, or worse, just an average, this pattern would have been > >> >> obscured. Obligatory lecture about measuring latency by Gil Tene > >> >> (https://www.youtube.com/watch?v=9MKY4KypBzg). To verify this > >> >>hypothesis, > >> >> you should re-write the benchmark to plot the latencies for each > >>write > >> >>to > >> >> a partition for each producer into a histogram. (HRD histogram is > >>pretty > >> >> good for that). This would give you producers*partitions histograms, > >> >> which might be unwieldy for that many producers. But wait, there is > >> >>hope! > >> >> > >> >> To verify that this hypothesis holds, you just have to see that there > >> >>is a > >> >> significant difference between different partitions on a SINGLE > >> >>producing > >> >> client. So, pick one producing client at random and use the data from > >> >> that. The easy way to do that is just plot all the partition latency > >> >> histograms on top of each other in the same plot, that way you have a > >> >> pretty plot to show people. If you don¹t want to setup plotting, you > >> >>can > >> >> just compare the medians (50¹th percentile) of the partitions¹ > >> >>histograms. > >> >> If there is a lot of variance, your latency anomaly is explained by > >> >> brokers 4-7 being slower than nodes 0-3! If there isn¹t a lot of > >> >>variance > >> >> at 50%, look at higher percentiles. And if higher percentiles for > >>all > >> >>the > >> >> partitions look the same, this hypothesis is disproved. > >> >> > >> >> If you want to make a general statement about latency of writing to > >> >>kafka, > >> >> you can merge all the histograms into a single histogram and plot > >>that. > >> >> > >> >> To Yuheng¹s credit, more brokers always results in more throughput. > >>But > >> >> throughput and latency are two different creatures. Its worth noting > >> >>that > >> >> kafka is designed to be high throughput first and low latency second. > >> >>And > >> >> it does a really good job at both. > >> >> > >> >> Disclaimer: I might not like linear algebra, but I do like > >>statistics. > >> >> Let me know if there are topics that need more explanation above that > >> >> aren¹t covered by Gil¹s lecture. > >> >> -Erik > >> >> > >> >> On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote: > >> >> > >> >> >When I using 32 partitions, the 4 brokers latency becomes larger > >>than > >> >>the > >> >> >8 > >> >> >brokers latency. > >> >> > > >> >> >So is it always true that using more brokers can give less latency > >>when > >> >> >the > >> >> >number of partitions is at least the size of the brokers? > >> >> > > >> >> >Thanks. > >> >> > > >> >> >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du > >><yuheng.du.h...@gmail.com> > >> >> >wrote: > >> >> > > >> >> >> I am running a producer latency test. When using 92 producers in > >>92 > >> >> >> physical node publishing to 4 brokers, the latency is slightly > >>lower > >> >> >>than > >> >> >> using 8 brokers, I am using 8 partitions for the topic. > >> >> >> > >> >> >> I have rerun the test and it gives me the same result, the 4 > >>brokers > >> >> >> scenario still has lower latency than the 8 brokers scenarios. > >> >> >> > >> >> >> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8 > >> >>brokers, > >> >> >>16 > >> >> >> brokers and 32 brokers. For the rest of the case the latency > >> >>decreases > >> >> >>as > >> >> >> the number of brokers increase. > >> >> >> > >> >> >> 4 brokers/8 brokers is the only pair that doesn't satisfy this > >>rule. > >> >> >>What > >> >> >> could be the cause? > >> >> >> > >> >> >> I am using a 200 bytes message, the test let each producer > >>publishes > >> >> >>500k > >> >> >> messages to a given topic. Every test run when I change the > >>number of > >> >> >> brokers, I use a new topic. > >> >> >> > >> >> >> Thanks for any advices. > >> >> >> > >> >> > >> >> > >> > >> > >