There's a few unknown parameters here that might influence the answer,
though. From the top of my head, at least
- How much replication of the data is needed (for high availability), and
how many acks for the producer? (If fire-and-forget it can be faster, if
need to replicate and ack from 3 brokers in different DC's then will be
slower)
- Transactions? (If end-to-end exactly-once then it's a lot slower)
- Size of the messages? (If each message is a GB it will obviously be
slower)
- Distance and bandwidth between the producers, Kafka & the consumers? (If
the network links get saturated that would limit the performance. Latency
is likely less important than throughput, but if your consumers are in
Tokyo and the producer in London then it will likely also be slower)

FWIW, I find that the producer side is generally the limiting factor,
especially if there is only one.
I'd take a look at e.g. the Appendix test details on
https://docs.confluent.io/2.0.0/clients/librdkafka/INTRODUCTION_8md.html. I
haven't yet seen a faster Kafka impl than rdkafka, so those would be
reasonable upper bounds.

On Thu, Jan 6, 2022 at 4:25 PM Marisa Queen <marisa.queen...@gmail.com>
wrote:

> Hi Israel,
>
> Your email is great, but I'm afraid to forward it to my customer because it
> doesn't answer his question.
>
> I'm hoping that other members from this list will be able to give me a more
> NUMERIC answer, let's wait to see.
>
> Just to give you some follow up on your answer, when you say:
>
> > 30 passengers per driver or aircraft per day may not sound impressive but
> 750,000 passengers per day all together is how you should look at it
>
> Well, with this rationality one can come up with any desired throughput
> number by just adding more partitions. Do you see my customer point that
> this does not make any sense? Adding more partitions also does not come for
> free, because messages need to be separated into the newly created
> partition and ordering will be lost. Order is important for some messages,
> so to keep adding more partitions towards an infinite throughput is not an
> option.
>
> I've just spoken to him here, his reply was:
>
> "Marisa, I'm asking a very simple question for a very basic Kafka scenario.
> If I can't get an answer for that, then I'm in trouble. Can you please find
> out with your peers/community what is a good throughput number to have in
> mind for the scenario I've been describing. Again it is a very basic and
> simple scenario: I have 10 million messages that I need to send from
> producers to consumers. Let's assume I have 1 topic, 1 producer for this
> topic, 4 partitions for this topic and 4 consumers, one for each partition.
> What I would like to know is: How long is it going to take for these 10
> million messages to travel all the way from the producer to the consumers?
> That's the throughput performance number I'm interested in."
>
> I surely won't tell him: "Hey, that's easy, you have 4 partitions, each
> partition according to LinkedIn can handle 23 messages per second, so we
> are looking for a 92 messages per second throughput here!"
>
> Cheers,
>
> M. Queen
>
>
> On Thu, Jan 6, 2022 at 12:58 PM Israel Ekpo <israele...@gmail.com> wrote:
>
> > Hi Marisa
> >
> > I think there may be some confusion about the throughput for each
> partition
> > and I want to explain briefly using some analogies
> >
> > Using transportation for example if we were to pick an airline or
> > ridesharing organization to describe the volume of customers they can
> > support per day we would have to look at how many total customers can
> > American Airlines service in a day or how many customers can Uber or Lyft
> > serve in a day. We would not zero in on only the number of customers a
> > particular driver can service or the number of passengers are particular
> > aircraft than service in a day. That would be very limiting considering
> the
> > hundreds of thousands of aircrafts or drivers actively transporting
> > passengers in real time.
> >
> > 30 passengers per driver or aircraft per day may not sound impressive but
> > 750,000 passengers per day all together is how you should look at it
> >
> > Partitions in Kafka are just a logical unit for organizing and storing
> data
> > within a Kafka topic. You should not base your analysis on just what a
> > subunit of storage is able to support.
> >
> > I would recommend taking a look at Kafka Summit talks on performance and
> > benchmarks to get some understanding how what Kafka is able to do and the
> > applicable use cases in the Financial Services industry
> >
> > A lot of reputable organizations already trust Kafka today for their
> needs
> > so this is already proven
> >
> > https://kafka.apache.org/powered-by
> >
> > I hope this helps.
> >
> > Israel Ekpo
> > Lead Instructor, IzzyAcademy.com
> > https://www.youtube.com/c/izzyacademy
> > https://izzyacademy.com/
> >
> >
> > On Thu, Jan 6, 2022 at 10:01 AM Marisa Queen <marisa.queen...@gmail.com>
> > wrote:
> >
> > > Cheers from NYC!
> > >
> > > I'm trying to give a performance number to a potential client (from the
> > > financial market) who asked me the following question:
> > >
> > > *"If I have a Kafka system setup in the best way possible for
> > performance,
> > > what is an approximate number that I can have in mind for the
> throughput
> > of
> > > this system?"*
> > >
> > > The client proceeded to say:
> > >
> > > *"What I want to know specifically, is how many messages per second
> can I
> > > send from one side of my distributed system to the other side with
> Apache
> > > Kafka."*
> > >
> > > And he concluded with:
> > >
> > > *"To give you an example, let's say I have 10 million messages that I
> > need
> > > to send from producers to consumers. Let's assume I have 1 topic, 1
> > > producer for this topic, 4 partitions for this topic and 4 consumers,
> one
> > > for each partition. What I would like to know is: How long is it going
> to
> > > take for these 10 million messages to travel all the way from the
> > producer
> > > to the consumers? That's the throughput performance number I'm
> interested
> > > in."*
> > >
> > > I read in a reddit post yesterday (for some reason I can't find the
> post
> > > anymore) that Kafka is able to handle 7 trillion messages per day. The
> > > LinkedIn article about it, says:
> > >
> > >
> > > *"We maintain over 100 Kafka clusters with more than 4,000 brokers,
> which
> > > serve more than 100,000 topics and 7 million partitions. The total
> number
> > > of messages handled by LinkedIn’s Kafka deployments recently surpassed
> 7
> > > trillion per day."*
> > >
> > > The OP of the reddit post went on to say that WhatsApp is handling
> around
> > > 64 billion messages per day (740,000 msgs per sec x 24 x 60 x 60) and
> > that
> > > 7
> > > trillion for LinkedIn is a huge number, giving a whopping 81 million
> > > messages per second for LinkedIn. But that doesn't matter for my
> > question.
> > >
> > > 7 Trillion messages divided by 7 million partitions gives us 1 million
> > > messages per day per partition. So to calculate the throughput we do:
> > >
> > >     1 million divided by 60 divided by 60 divided by 24 => *23 messages
> > per
> > > second per partition*
> > >
> > > We'll all agree that 23 messages per second per partition for
> throughput
> > > performance is very low, so I can't give this number to my potential
> > > client.
> > >
> > > So my question is: *What number should I give to my potential client?*
> > Note
> > > that he is a stubborn and strict bank CTO, so he won't take any talk
> from
> > > me. He wants a mathematical answer using the scientific method.
> > >
> > > Has anyone been in my shoes and can shed some light on this kafka
> > > throughput performance topic?
> > >
> > > Cheers,
> > >
> > > M. Queen
> > >
> >
>

Reply via email to