Re: Kafka performance when it comes to throughput

Joris Peeters Thu, 06 Jan 2022 10:50:57 -0800

These tutorials - though quite a bit outdated - seem quite useful:
http://cloudurable.com/blog/kafka-tutorial-kafka-producer/index.html (and
the follow-ups).
Ends up being close to how I write this in Java, and tutorial 13 talks
about batching and acks etc, which you'll need in order to tune to maximise
your throughput.


I'm sure someone else has better example resources.



On Thu, Jan 6, 2022 at 6:25 PM Marisa Queen <marisa.queen...@gmail.com>
wrote:

> Hi Joris,
>
> Thank you so much. I plan to write a Java Consumer and a Java Producer, for
> my benchmark. Do you recommend an example that I can use as a reference to
> write my basic Java producer and simple Java consumer? I'll for sure share
> the through number I get with the community. Maybe even write a blog post
> about it. I hope it is more than 23 messages per second per partition
> :PPPPP
>
> Cheers,
>
> M. Queen
>
>
> On Thu, Jan 6, 2022 at 2:14 PM Joris Peeters <joris.mg.peet...@gmail.com>
> wrote:
>
> > I'd just follow the instructions in https://kafka.apache.org/quickstart
> to
> > set up Kafka and Zookeeper on a single node, by running the Java
> processes
> > directly. Or can run in Docker.
> >
> > For the producer and consumer I'd personally use Python, as it's the
> > easiest to get going. You may want to look at
> > https://kafka-python.readthedocs.io/en/master/# (easier) and
> > https://github.com/confluentinc/confluent-kafka-python (faster). Similar
> > things exist for Go, Java, C++, ...
> > Or I'm sure there are some benchmark setups out there that you can tweak
> a
> > little.
> >
> > I appreciate that setting up everything on localhost will be easier and
> > lead to big numbers, but bear in mind that it's typically all the other
> > real-life stuff (remote connections, replication, at-least-once, ...)
> that
> > causes massive slowdowns compared to localhost, and those are things
> banks
> > eventually tend to need (I work in finance industry myself). What you're
> > doing is a very useful benchmark, but I'd surround it with the above
> > caveats to avoid overpromising.
> >
> > -J
> >
> >
> > On Thu, Jan 6, 2022 at 4:58 PM Marisa Queen <marisa.queen...@gmail.com>
> > wrote:
> >
> > > Hi Joris,
> > >
> > > I've spoken to him. His answers are below:
> > >
> > >
> > > On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters <
> joris.mg.peet...@gmail.com
> > >
> > > wrote:
> > >
> > > > There's a few unknown parameters here that might influence the
> answer,
> > > > though. From the top of my head, at least
> > > > - How much replication of the data is needed (for high availability),
> > and
> > > > how many acks for the producer? (If fire-and-forget it can be faster,
> > if
> > > > need to replicate and ack from 3 brokers in different DC's then will
> be
> > > > slower)
> > > >
> > >
> > > Let's assume no high-availability for now, for simplicity's sake.
> > > Fire-and-forget like he said. We don't want to overcomplicate this
> simple
> > > benchmark and we want the highest possible throughput number.
> > >
> > >
> > > > - Transactions? (If end-to-end exactly-once then it's a lot slower)
> > > >
> > >
> > > Again no transactions. Let's keep it simple.
> > >
> > >
> > > > - Size of the messages? (If each message is a GB it will obviously be
> > > > slower)
> > > >
> > >
> > > Let's assume 512 bytes. Powers of two are fun!
> > >
> > >
> > > > - Distance and bandwidth between the producers, Kafka & the
> consumers?
> > > (If
> > > > the network links get saturated that would limit the performance.
> > Latency
> > > > is likely less important than throughput, but if your consumers are
> in
> > > > Tokyo and the producer in London then it will likely also be slower)
> > > >
> > >
> > >
> > > Loopback, same machine, for the love of God. Let's not even go there.
> We
> > > want the highest possible throughput. I accept the limit of the speed
> of
> > > light. If network particularities, and distances, are to be included in
> > > this measurement then it is basically worth nothing. Loopback
> eliminates
> > > all those network variables that we surely don't want to include in the
> > > benchmark.
> > >
> > >
> > > >
> > > > FWIW, I find that the producer side is generally the limiting factor,
> > > > especially if there is only one.
> > > > I'd take a look at e.g. the Appendix test details on
> > > >
> > https://docs.confluent.io/2.0.0/clients/librdkafka/INTRODUCTION_8md.html
> > > .
> > > > I
> > > > haven't yet seen a faster Kafka impl than rdkafka, so those would be
> > > > reasonable upper bounds.
> > > >
> > >
> > >
> > > Thanks for your reply, Joris. Can you point me to a Hello World Kafka
> > > example, so I can set up this very basic and BARE BONES Kafka system,
> > > without any of the complications you correctly mentioned above? I have
> 10
> > > million messages that I need to send from producers to consumers. I
> have
> > 1
> > > topic, 1 producer for this topic, 4 partitions for this topic and 4
> > > consumers, one for each partition. Everything loopback, same machine,
> no
> > > high-availability, transactions, etc. just KAFKA BARE BONES. What can
> be
> > > more trivial and basic than that?
> > >
> > > Cheers,
> > >
> > > M. Queen
> > >
> > >
> > > >
> > > > On Thu, Jan 6, 2022 at 4:25 PM Marisa Queen <
> marisa.queen...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Israel,
> > > > >
> > > > > Your email is great, but I'm afraid to forward it to my customer
> > > because
> > > > it
> > > > > doesn't answer his question.
> > > > >
> > > > > I'm hoping that other members from this list will be able to give
> me
> > a
> > > > more
> > > > > NUMERIC answer, let's wait to see.
> > > > >
> > > > > Just to give you some follow up on your answer, when you say:
> > > > >
> > > > > > 30 passengers per driver or aircraft per day may not sound
> > impressive
> > > > but
> > > > > 750,000 passengers per day all together is how you should look at
> it
> > > > >
> > > > > Well, with this rationality one can come up with any desired
> > throughput
> > > > > number by just adding more partitions. Do you see my customer point
> > > that
> > > > > this does not make any sense? Adding more partitions also does not
> > come
> > > > for
> > > > > free, because messages need to be separated into the newly created
> > > > > partition and ordering will be lost. Order is important for some
> > > > messages,
> > > > > so to keep adding more partitions towards an infinite throughput is
> > not
> > > > an
> > > > > option.
> > > > >
> > > > > I've just spoken to him here, his reply was:
> > > > >
> > > > > "Marisa, I'm asking a very simple question for a very basic Kafka
> > > > scenario.
> > > > > If I can't get an answer for that, then I'm in trouble. Can you
> > please
> > > > find
> > > > > out with your peers/community what is a good throughput number to
> > have
> > > in
> > > > > mind for the scenario I've been describing. Again it is a very
> basic
> > > and
> > > > > simple scenario: I have 10 million messages that I need to send
> from
> > > > > producers to consumers. Let's assume I have 1 topic, 1 producer for
> > > this
> > > > > topic, 4 partitions for this topic and 4 consumers, one for each
> > > > partition.
> > > > > What I would like to know is: How long is it going to take for
> these
> > 10
> > > > > million messages to travel all the way from the producer to the
> > > > consumers?
> > > > > That's the throughput performance number I'm interested in."
> > > > >
> > > > > I surely won't tell him: "Hey, that's easy, you have 4 partitions,
> > each
> > > > > partition according to LinkedIn can handle 23 messages per second,
> so
> > > we
> > > > > are looking for a 92 messages per second throughput here!"
> > > > >
> > > > > Cheers,
> > > > >
> > > > > M. Queen
> > > > >
> > > > >
> > > > > On Thu, Jan 6, 2022 at 12:58 PM Israel Ekpo <israele...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi Marisa
> > > > > >
> > > > > > I think there may be some confusion about the throughput for each
> > > > > partition
> > > > > > and I want to explain briefly using some analogies
> > > > > >
> > > > > > Using transportation for example if we were to pick an airline or
> > > > > > ridesharing organization to describe the volume of customers they
> > can
> > > > > > support per day we would have to look at how many total customers
> > can
> > > > > > American Airlines service in a day or how many customers can Uber
> > or
> > > > Lyft
> > > > > > serve in a day. We would not zero in on only the number of
> > customers
> > > a
> > > > > > particular driver can service or the number of passengers are
> > > > particular
> > > > > > aircraft than service in a day. That would be very limiting
> > > considering
> > > > > the
> > > > > > hundreds of thousands of aircrafts or drivers actively
> transporting
> > > > > > passengers in real time.
> > > > > >
> > > > > > 30 passengers per driver or aircraft per day may not sound
> > impressive
> > > > but
> > > > > > 750,000 passengers per day all together is how you should look at
> > it
> > > > > >
> > > > > > Partitions in Kafka are just a logical unit for organizing and
> > > storing
> > > > > data
> > > > > > within a Kafka topic. You should not base your analysis on just
> > what
> > > a
> > > > > > subunit of storage is able to support.
> > > > > >
> > > > > > I would recommend taking a look at Kafka Summit talks on
> > performance
> > > > and
> > > > > > benchmarks to get some understanding how what Kafka is able to do
> > and
> > > > the
> > > > > > applicable use cases in the Financial Services industry
> > > > > >
> > > > > > A lot of reputable organizations already trust Kafka today for
> > their
> > > > > needs
> > > > > > so this is already proven
> > > > > >
> > > > > > https://kafka.apache.org/powered-by
> > > > > >
> > > > > > I hope this helps.
> > > > > >
> > > > > > Israel Ekpo
> > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > https://izzyacademy.com/
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 6, 2022 at 10:01 AM Marisa Queen <
> > > > marisa.queen...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Cheers from NYC!
> > > > > > >
> > > > > > > I'm trying to give a performance number to a potential client
> > (from
> > > > the
> > > > > > > financial market) who asked me the following question:
> > > > > > >
> > > > > > > *"If I have a Kafka system setup in the best way possible for
> > > > > > performance,
> > > > > > > what is an approximate number that I can have in mind for the
> > > > > throughput
> > > > > > of
> > > > > > > this system?"*
> > > > > > >
> > > > > > > The client proceeded to say:
> > > > > > >
> > > > > > > *"What I want to know specifically, is how many messages per
> > second
> > > > > can I
> > > > > > > send from one side of my distributed system to the other side
> > with
> > > > > Apache
> > > > > > > Kafka."*
> > > > > > >
> > > > > > > And he concluded with:
> > > > > > >
> > > > > > > *"To give you an example, let's say I have 10 million messages
> > > that I
> > > > > > need
> > > > > > > to send from producers to consumers. Let's assume I have 1
> > topic, 1
> > > > > > > producer for this topic, 4 partitions for this topic and 4
> > > consumers,
> > > > > one
> > > > > > > for each partition. What I would like to know is: How long is
> it
> > > > going
> > > > > to
> > > > > > > take for these 10 million messages to travel all the way from
> the
> > > > > > producer
> > > > > > > to the consumers? That's the throughput performance number I'm
> > > > > interested
> > > > > > > in."*
> > > > > > >
> > > > > > > I read in a reddit post yesterday (for some reason I can't find
> > the
> > > > > post
> > > > > > > anymore) that Kafka is able to handle 7 trillion messages per
> > day.
> > > > The
> > > > > > > LinkedIn article about it, says:
> > > > > > >
> > > > > > >
> > > > > > > *"We maintain over 100 Kafka clusters with more than 4,000
> > brokers,
> > > > > which
> > > > > > > serve more than 100,000 topics and 7 million partitions. The
> > total
> > > > > number
> > > > > > > of messages handled by LinkedIn’s Kafka deployments recently
> > > > surpassed
> > > > > 7
> > > > > > > trillion per day."*
> > > > > > >
> > > > > > > The OP of the reddit post went on to say that WhatsApp is
> > handling
> > > > > around
> > > > > > > 64 billion messages per day (740,000 msgs per sec x 24 x 60 x
> 60)
> > > and
> > > > > > that
> > > > > > > 7
> > > > > > > trillion for LinkedIn is a huge number, giving a whopping 81
> > > million
> > > > > > > messages per second for LinkedIn. But that doesn't matter for
> my
> > > > > > question.
> > > > > > >
> > > > > > > 7 Trillion messages divided by 7 million partitions gives us 1
> > > > million
> > > > > > > messages per day per partition. So to calculate the throughput
> we
> > > do:
> > > > > > >
> > > > > > >     1 million divided by 60 divided by 60 divided by 24 => *23
> > > > messages
> > > > > > per
> > > > > > > second per partition*
> > > > > > >
> > > > > > > We'll all agree that 23 messages per second per partition for
> > > > > throughput
> > > > > > > performance is very low, so I can't give this number to my
> > > potential
> > > > > > > client.
> > > > > > >
> > > > > > > So my question is: *What number should I give to my potential
> > > > client?*
> > > > > > Note
> > > > > > > that he is a stubborn and strict bank CTO, so he won't take any
> > > talk
> > > > > from
> > > > > > > me. He wants a mathematical answer using the scientific method.
> > > > > > >
> > > > > > > Has anyone been in my shoes and can shed some light on this
> kafka
> > > > > > > throughput performance topic?
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > M. Queen
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Kafka performance when it comes to throughput

Reply via email to