Hi Okada,

Thanks for your reply. Finally I see some numbers! I love numbers :)

I've shown your email to my boss (I hope he will hire me to do this
project) and he said the following:

"I would like to see this 833k/sec number for myself. Am I asking too much?
:) Can you set up a very basic and simple Kafka system to measure this
number in one of my powerful lab machines, using loopback? I don't want to
include any wire-time on this measurement, that's why I want to use
loopback (127.0.0.1). The Kafka system, consumers and producers, everything
should be running on this single powerful multi-core machine over loopback."

He is correct here. It should be very basic and simple to set up a Kafka
system in a single machine to send some messages from producer to
consumers. Something like a Hello World Kafka example. Then I would just
need to measure the total time it takes for me to send 10 million messages
and this number should be around 833k/sec. By doing that I will get him to
let me use Kafka.

Can you point me to any resource explaining how I can run a simple Hello
World Kafka, that can send some messages from 1 producer to 4 consumers,
each reading from a single partition? Again, my setup is very basic: I have
10 million messages that I need to send from producers to consumers. I have
1 topic, 1 producer for this topic, 4 partitions for this topic and 4
consumers, one for each partition. What can be more trivial and basic than
that when it comes to Kafka?

Thank you very much for your help and expertise!

Cheers,

M. Queen

On Thu, Jan 6, 2022 at 1:15 PM Haruki Okada <ocadar...@gmail.com> wrote:

> Hi, Marisa.
>
> Kafka is well-designed to make full use of system resources, so I think
> calculating based on machine's spec is a good start.
>
> Let's say we have servers with 10Gbps full-duplex NIC.
> Also, let's say we set the topic's replication factor to 3 (so the cluster
> will have minimum 3 servers), and the average produced message size is 500
> bytes.
>
> Then, a single machine's spec-wise throughput bound will be calculated as
> follows:
> - Max messages / sec that single machine can transmit = 10Gbps / 8 (convert
> to byte) / 3 (replicate to 2 replicas & fetched by 1 consumer group) / 500
> = 833K.
>
> Note that, of course this is just an example so you should also take other
> factors into account (e.g. HDD throughput etc).
> Also, I think producing / consuming to a single partition at a rate of 833K
> msg/sec is a bit hard due to client-side bottlenecks so we may need to
> adjust partition count as well.
>
> However, at least, 833K msg/sec for 500 bytes messages with above spec
> sounds not so far from my experience of running Kafka in production.
>
> 2022年1月7日(金) 0:01 Marisa Queen <marisa.queen...@gmail.com>:
>
> > Cheers from NYC!
> >
> > I'm trying to give a performance number to a potential client (from the
> > financial market) who asked me the following question:
> >
> > *"If I have a Kafka system setup in the best way possible for
> performance,
> > what is an approximate number that I can have in mind for the throughput
> of
> > this system?"*
> >
> > The client proceeded to say:
> >
> > *"What I want to know specifically, is how many messages per second can I
> > send from one side of my distributed system to the other side with Apache
> > Kafka."*
> >
> > And he concluded with:
> >
> > *"To give you an example, let's say I have 10 million messages that I
> need
> > to send from producers to consumers. Let's assume I have 1 topic, 1
> > producer for this topic, 4 partitions for this topic and 4 consumers, one
> > for each partition. What I would like to know is: How long is it going to
> > take for these 10 million messages to travel all the way from the
> producer
> > to the consumers? That's the throughput performance number I'm interested
> > in."*
> >
> > I read in a reddit post yesterday (for some reason I can't find the post
> > anymore) that Kafka is able to handle 7 trillion messages per day. The
> > LinkedIn article about it, says:
> >
> >
> > *"We maintain over 100 Kafka clusters with more than 4,000 brokers, which
> > serve more than 100,000 topics and 7 million partitions. The total number
> > of messages handled by LinkedIn’s Kafka deployments recently surpassed 7
> > trillion per day."*
> >
> > The OP of the reddit post went on to say that WhatsApp is handling around
> > 64 billion messages per day (740,000 msgs per sec x 24 x 60 x 60) and
> that
> > 7
> > trillion for LinkedIn is a huge number, giving a whopping 81 million
> > messages per second for LinkedIn. But that doesn't matter for my
> question.
> >
> > 7 Trillion messages divided by 7 million partitions gives us 1 million
> > messages per day per partition. So to calculate the throughput we do:
> >
> >     1 million divided by 60 divided by 60 divided by 24 => *23 messages
> per
> > second per partition*
> >
> > We'll all agree that 23 messages per second per partition for throughput
> > performance is very low, so I can't give this number to my potential
> > client.
> >
> > So my question is: *What number should I give to my potential client?*
> Note
> > that he is a stubborn and strict bank CTO, so he won't take any talk from
> > me. He wants a mathematical answer using the scientific method.
> >
> > Has anyone been in my shoes and can shed some light on this kafka
> > throughput performance topic?
> >
> > Cheers,
> >
> > M. Queen
> >
>
>
> --
> ========================
> Okada Haruki
> ocadar...@gmail.com
> ========================
>

Reply via email to