Hey Jiangjie,

Yeah, not sure the bottleneck. It maybe the sender or lock contention on
the writer threads. You could use top or one of the java tools to check out
the per-thread cpu usage.

The benchmarking I had done previously showed an ability to max out the 1G
network cards we had with a single thread down to pretty small messages. So
I guess there is really no reason in that scenario to have lots of
producers. I think repeating the benchmarking and optimization process on
10g would be a useful exercise.

The reason for generally recommending just one producer instance is that
generally better batching actually makes both the producer and the broker
faster. So having a bunch of under-used producers tends to lead to small
requests which isn't great. But this isn't a hard and fast rule, I strongly
suspect it's possible to max out lock contention on the accumulator as well
as the sender thread (not sure what happens first).

The new producer is pretty new still so I suspect there is a fair amount of
low-hanging performance work for anyone who wanted to take a shot at it.

-Jay

On Mon, Apr 27, 2015 at 10:55 AM, Jiangjie Qin <j...@linkedin.com.invalid>
wrote:

> Hi Jay,
>
> Does o.a.k.clients.tools.ProducerPerformance provide multi-thread test? I
> did not find it.
>
> I tweaked the test a little bit to make it multi-threaded and what I found
> is that in a single thread case, with each message of 10 bytes, single
> caller thread has ~2M messages/second throughput. 2 threads gives ~1.2 M
> messages/second each, 10 threads gives ~0.11M messages/second each. My
> topic has 8 partitions.
>
> It looks in my test, the bottleneck is actually the sender thread (not
> network bandwidth) instead of number of caller threads because I'm sending
> really small uncompressed messages. In this case, you do need more than
> one producer.
>
> From what I understand, the main reasons we recommend to share a producer
> are:
> 1. We have a per partition lock in the producer, so lock contention should
> not be an issue assuming there are many partitions to send data to.
> 2. The caller threads usually are much slower than sender thread because
> they have to do the compression.
>
> So I guess in general cases sharing a producer would provide good
> performance and save memory footprint. But if sender thread becomes a
> bottleneck, it is time to have more producers.
>
> Please correct me if I miss something.
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
> On 4/24/15, 3:23 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>
> >That should work. I recommend using the performance tool cited in the blog
> >linked from the "performance" page of the website. That tool is more
> >accurate and uses the new producer.
> >
> >On Fri, Apr 24, 2015 at 2:29 PM, Roshan Naik <ros...@hortonworks.com>
> >wrote:
> >
> >> Can we use the new 0.8.2 producer perf tool against a 0.8.1 broker ?
> >> -roshan
> >>
> >>
> >> On 4/24/15 1:19 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
> >>
> >> >Do make sure if you are at all performance sensitive you are using the
> >>new
> >> >producer api we released in 0.8.2.
> >> >
> >> >-Jay
> >> >
> >> >On Fri, Apr 24, 2015 at 12:46 PM, Roshan Naik <ros...@hortonworks.com>
> >> >wrote:
> >> >
> >> >> Yes, I too notice the same behavior (with producer/consumer perf
> >>tool on
> >> >> 8.1.2) Š adding more threads indeed improved the perf a lot (both
> >>with
> >> >>and
> >> >> without --sync). in --sync mode
> >> >>   batch size made almost no diff, larger events improved the perf.
> >> >>
> >> >> I was doing some 8.1.2 perf testing with a 1 node broker setup
> >> >>(machine:
> >> >> 32 cpu cores, 256gb RAM, 10gig ethernet, 1 x 15000rpm disks,).
> >> >>
> >> >> My observations:
> >> >>
> >> >>
> >> >>
> >> >> ASYNC MODE:
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Partition Count: large improvement when going from 1 to 2, beyond 2
> >>see
> >> >>a
> >> >> slight dip
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>   Number of producer threads: perf much better than sync mode with 1
> >> >> thread, perf peaks out with ~10 threads, beyond 10 thds perf impacted
> >> >> negatively
> >> >>
> >> >>
> >> >>
> >> >> SYNC MODE (does not seem to use batch size)
> >> >> Batch Size: little to no impact
> >> >> Event Size: perf scales linearly with event size
> >> >> Number of producer threads: poor perf with one thread, improves with
> >> >>more
> >> >> threads,peaks around 30 to 50 threads
> >> >> socket.send.buffer.bytes : increasing it Made a small but measurable
> >> >> difference (~4%)
> >> >>
> >> >>
> >> >> --SYNC mode was much slower.
> >> >>
> >> >>
> >> >> I modified the producer perf tool to use the scala batched producer
> >>api
> >> >> (not available in v8.2) --sync mode and perf of --sync mode was
> >>closer
> >> >>to
> >> >> async mode.
> >> >>
> >> >>
> >> >> -roshan
> >> >>
> >> >>
> >> >>
> >> >> On 4/24/15 11:42 AM, "Navneet Gupta (Tech - BLR)"
> >> >> <navneet.gu...@flipkart.com> wrote:
> >> >>
> >> >> >Hi,
> >> >> >
> >> >> >I ran some tests on our cluster by sending message from multiple
> >> >>clients
> >> >> >(machines). Each machine had about 40-100 threads per producer.
> >> >> >
> >> >> >I thought of trying out having multiple producers per clients with
> >>each
> >> >> >producer receiving messages from say 10-15 threads. I actually did
> >>see
> >> >>an
> >> >> >increase in throughput in this case. It was not one off cases but a
> >> >> >repeatable phenomenon. I called threads to producer ratio
> >> >>sharingFactor in
> >> >> >my code.
> >> >> >
> >> >> >I am not planning to use it this way in our clients sending
> >>messages to
> >> >> >Kafka but it did go against the suggestion to have single producer
> >> >>across
> >> >> >all threads.
> >> >> >
> >> >> >
> >> >> >
> >> >> >On Fri, Apr 24, 2015 at 10:27 PM, Manikumar Reddy
> >> >><ku...@nmsworks.co.in>
> >> >> >wrote:
> >> >> >
> >> >> >> Hi Jay,
> >> >> >>
> >> >> >> Yes, we are producing from single process/jvm.
> >> >> >>
> >> >> >> From docs "The producer will attempt to batch records together
> >>into
> >> >> >>fewer
> >> >> >> requests whenever multiple records are being sent to the same
> >> >> >>partition."
> >> >> >>
> >> >> >> If I understand correctly, batching happens at topic/partition
> >>level,
> >> >> >>not
> >> >> >> at Node level. right?
> >> >> >>
> >> >> >> If yes, then  both (single producer for all topics , separate
> >> >>producer
> >> >> >>for
> >> >> >> each topic) approaches
> >> >> >> may give similar performance.
> >> >> >>
> >> >> >> On Fri, Apr 24, 2015 at 9:29 PM, Jay Kreps <jay.kr...@gmail.com>
> >> >>wrote:
> >> >> >>
> >> >> >> > If you are talking about within a single process, having one
> >> >>producer
> >> >> >>is
> >> >> >> > generally the fastest because batching dramatically reduces the
> >> >> >>number of
> >> >> >> > requests (esp using the new java producer).
> >> >> >> > -Jay
> >> >> >> >
> >> >> >> > On Fri, Apr 24, 2015 at 4:54 AM, Manikumar Reddy <
> >> >> >> > manikumar.re...@gmail.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> > > We have a 2 node cluster with 100 topics.
> >> >> >> > > should we use a single producer for all topics or  create
> >> >>multiple
> >> >> >> > > producers?
> >> >> >> > > What is the best choice w.r.t network load/failures, node
> >> >>failures,
> >> >> >> > > latency, locks?
> >> >> >> > >
> >> >> >> > > Regards,
> >> >> >> > > Manikumar
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> >--
> >> >> >Thanks & Regards,
> >> >> >Navneet Gupta
> >> >>
> >> >>
> >>
> >>
>
>

Reply via email to