Re: Kafka performance when it comes to throughput

Israel Ekpo Sat, 08 Jan 2022 16:17:08 -0800

Hi Marisa,

I am going to be running it in a Kubernetes cluster on Azure Kubernetes
Service using the set up scripts available in my github repo


https://github.com/izzyacademy/kafka-in-a-box
https://youtu.be/TDw3tDAiBBM

I will review the recommendations from the EventSizer.io tool as well to
make sure my machines within the Kubernetes cluster can provide the
expected throughput
https://eventsizer.io/

I am thinking about running between 1 and 6 producers(1,2, 3, 6) during the
tests to ship the events to a cluster of 3 brokers (with or without
zookeeper) and on the retrieval side, I plan to run between 1 and 6 (1, 2,
3, 6) consumers.

The test topics will have between 1 to 12 partitions and we will measure it
for each of the different permutations.

I will share the machine, CPU, RAM and disk configurations soon. I thought
about running it on my laptop 4CPU, 32 GB RAM 1TB storage but I think we
might have a bottleneck with the partition leaders and followers competing
to write to the same disk simultaneously so I don't want anything to skew
the results. This will ensure that there is nothing else competing for RAM,
CPU or disk resources during the tests. In the Kubernetes setup I can use
individual isolated premium persistent storage for each broker pod to avoid
this.

We will be using Kafka 3.1.0 on Ubuntu 20.04 docker images, with Java JDK
11 or 17 on these machines and setting up the brokers in Zookeeper mode and
Kraft mode using Kafka 3.1.0

I will make sure to use premium storage with SSDs as recommended by the
experts so that writing to disk is not a bottleneck especially for maximum
throughput and the lowest latency. The machine, storage and network
performance need to be optimal so I will share my setup as soon as they are
finalized.

Each of the producers, brokers and consumers will also need to allocate
sufficient heap space to support the expected performance. The network and
disk performance provided by each of the machines should support our
throughput expectations.

Finally, I appreciate your warm up recommendation, I will make sure to do
that. I almost forgot about that so I will have an initial run through to
warm up the cache before we start any measurements with the tests.

I plan to run each of the tests with 1 thousand, 100 thousand, 1 milion,
and 1 billion messages and we will measure the end to end time from
producing to broker arrival to consumer pickup

I am making notes and writing out the plan and scenarios prior to spinning
up the cluster, producers and consumers

This may take some time given that I am working on other projects at the
moment, but I will do my best to share the results with the community as
soon as possible.

Have a great weekend. I will be in touch soon.

Israel Ekpo
Lead Instructor, IzzyAcademy.com
https://www.youtube.com/c/izzyacademy
https://izzyacademy.com/


On Sat, Jan 8, 2022 at 10:29 AM Marisa Queen <marisa.queen...@gmail.com>
wrote:

> Hi Israel,
>
> Great job! It looks great and promising. I really like your YouTube channel
> and the way you present the material. A couple of things that you might
> want to consider for your benchmarks experiments:
>
> 1) What machine are you going to use? Is it a fast machine with enough cpu
> cores? I would say you need at least 8 cores: 1 for zookeeper, 1 for kafka,
> 1 for the producer and 1 for each consumer. I can run your benchmark code
> on my fast lab machine if you want to, and report the numbers.
>
> 2) Don't forget to warm up the code, before starting your measurements.
> Feel free to use my measurement code, which I'm also using for my
> benchmarks:. Check its main method for an example of how to use it. Let me
> know if you have any questions about this class:
> https://www.codepile.net/pile/0Dd2J9kR
>
> Cheers,
>
> M. Queen
>
>
> On Fri, Jan 7, 2022 at 9:06 PM Israel Ekpo <israele...@gmail.com> wrote:
>
> > Marisa,
> >
> > I have kicked off the video series on performance optimization for the
> > Kafka setup.
> >
> > I will be working on the various configurations for latency, throughput,
> > availability and durability.
> >
> > https://youtu.be/aPlbG349cXg
> >
> > The first ones will be on latency and throughput which is what you are
> > interested in and then I will work on the demos for availability and
> > durability later.
> >
> > This will be done in KRaft and Legacy mode with sample datasets of 1000,
> > 100000, and 1000000 messages end-to-end with variations in the number of
> > producers, consumers and partitions.
> >
> > I am looking forward to this.
> >
> > Israel Ekpo
> > Lead Instructor, IzzyAcademy.com
> > https://www.youtube.com/c/izzyacademy
> > https://izzyacademy.com/
> >
> >
> > On Thu, Jan 6, 2022 at 10:17 PM Marisa Queen <marisa.queen...@gmail.com>
> > wrote:
> >
> > > Wow, that's awesome! I wasn't expecting that. I truly appreciate your
> > help
> > > and professionalism.
> > >
> > > > Let me find some time soon and I will do a video on that scenario
> > > optimized primarily for low latency and throughput. I will also compare
> > how
> > > this performs when adjusted for durability and high availability.
> > >
> > > Take your time! That will be tremendously helpful. I was going to try
> to
> > do
> > > that myself, but I'm sure that you have better expertise to tune the
> > knobs
> > > for a more realistic and professional benchmark.
> > >
> > > I'm curious to see the numbers. Perhaps you can even start with the
> > > simplest of all setups: 1 producer, 1 topic, 1 partition, 1 consumer.
> 10
> > > million messages flowing. What messages-per-second number do you get?
> > Then
> > > move to 1 producer, 1 topic, 4 partitions, 1 consumer. Did it get
> better
> > or
> > > worse with the addition of multiple partitions?
> > >
> > > Thanks again, Israel.
> > >
> > > Cheers,
> > >
> > > M. Queen
> > >
> > >
> > > On Thu, Jan 6, 2022 at 11:52 PM Israel Ekpo <israele...@gmail.com>
> > wrote:
> > >
> > > > Thanks for your response Marisa.
> > > >
> > > > This has been a very interesting discussion and I appreciate it.
> > > >
> > > > It is a bit of a challenge in the sense that I wish I had a demo
> ready
> > to
> > > > go with similar use case and expectations  to easily explain what I
> > have
> > > > been trying to convey
> > > >
> > > > I am always ready for a challenge like this and to fix this I will
> like
> > > to
> > > > do a demo soon with the 10 million message scenario you
> > > > originally mentioned in your first message to track the time end to
> end
> > > > while capturing other metrics.
> > > >
> > > > Let me find some time soon and I will do a video on that scenario
> > > optimized
> > > > primarily for low latency and throughput. I will also compare how
> this
> > > > performs when adjusted for durability and high availability
> > > >
> > > > I wish I had this demo ready before now it would have clarified a lot
> > of
> > > > what I have been trying to explain regarding tuning the knobs for
> > > latency,
> > > > throughput, high availability and durability. By achieving what you
> are
> > > > willing to pay for I was only suggesting that highly performant
> system
> > > can
> > > > be expensive at times and I apologize if the tone came out wrong
> > > >
> > > >  I am grateful that you brought this up and it will give the
> community
> > > > something to reference in the future of similar questions come up
> > > regarding
> > > > benchmarks
> > > >
> > > > Thanks for bringing up the question, please let us know if you have
> > > > additional questions and you can reach out with any further questions
> > or
> > > > feedback you may have.
> > > >
> > > > Thanks again
> > > >
> > > > Sincerely
> > > > Israel Ekpo
> > > >
> > > > On Thu, Jan 6, 2022 at 9:18 PM Marisa Queen <
> marisa.queen...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Israel,
> > > > >
> > > > > > You can achieve any performance benchmark you are willing to pay
> > for.
> > > > >
> > > > > Thanks for your email. Allow me to respectfully disagree. I believe
> > > that
> > > > > some systems are better than others when it comes to performance.
> The
> > > > idea
> > > > > that I can just take a slow system, multiply by 1 million, and
> then I
> > > > have
> > > > > a super fast system, is at the least misleading. Assuming the same
> > > > hardware
> > > > > for everyone, some languages are faster than others. Some
> algorithms
> > > are
> > > > > faster than others. Some architectures are more efficient than
> > others.
> > > > Some
> > > > > protocols are faster than others.
> > > > >
> > > > > Take a binary search vs a linear search for example. Binary search
> is
> > > of
> > > > > course much faster and more efficient than linear search (for large
> > > > lists),
> > > > > but according to your rationale this is not a problem. Just buy
> > enough
> > > > > machines to do linear search in parallel and you can boast 1
> million
> > > > > searches per second. What an amazing search system you are
> deploying!
> > > It
> > > > > can do 1 million searches per second, that's more than enough for
> any
> > > > > system.
> > > > >
> > > > > 7 TRILLION messages per day for Kafka/LinkedIn sounds amazing when
> > just
> > > > > thrown on the table. Using your example, a transportation company
> can
> > > > > transport 5 packages per day using one of its bicycles. Is the
> > > > architecture
> > > > > of this company efficient? Fast? According to your rationale, it
> does
> > > not
> > > > > matter! The company needs only to buy 1 million bikes, and now it
> can
> > > > boast
> > > > > about delivering 5 million packages per day. You can say the
> company
> > > is a
> > > > > large corporation, but when it comes to efficiency it is more like
> a
> > > > > dinosaur. It has a high chance of being replaced by other more
> > > efficient
> > > > > companies in the future.
> > > > >
> > > > > To summarize, low latency is crucial for finance applications. You
> > > can't
> > > > > just say: "don't worry, it is proven and it can do 7 trillion
> > messages
> > > > per
> > > > > day". That just won't do it. A ceiling benchmark number, for
> latency
> > > and
> > > > > throughput, is paramount for any system that wants to operate in
> that
> > > > > industry. The answer is not "as much as you are willing to pay
> for".
> > > > >
> > > > > Cheers,
> > > > >
> > > > > M. Queen
> > > > >
> > > > >
> > > > > On Thu, Jan 6, 2022 at 8:53 PM Israel Ekpo <israele...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Marisa,
> > > > > >
> > > > > > I do not agree with your assessment. There are several factors
> that
> > > > could
> > > > > > influence your performance numbers even with localhost. Your
> > project
> > > > > should
> > > > > > be configured based on your own needs.
> > > > > >
> > > > > > Your throughput could go up or lower depending on how you are
> > > > configured
> > > > > > based on what is important for your use case(s).
> > > > > >
> > > > > > If you have other apps running on the machine that would impact
> > your
> > > > > > results. If you only have a 2 CPU, 4GB laptop, obviously you
> cannot
> > > > > compare
> > > > > > the results with a server that has 256GB of RAM and 64 Cores.
> > > > > >
> > > > > > Also, do not measure it in terms of messages per second but more
> in
> > > > terms
> > > > > > of data volume per second. A throughput of 100GBps will give you
> > 100
> > > > > > messages per second 1 GB per message or 100,000 messages per
> second
> > > at
> > > > > 1KB
> > > > > > each if you have smaller messages the same volume will give a
> > higher
> > > > > count
> > > > > > of messages for the same unit time.
> > > > > >
> > > > > > Take a look at the reference architecture and this best practices
> > > > > document
> > > > > > for how to optimize your performance based on your project goals
> > > > > > (durability, latency, throughput and availability)
> > > > > >
> > > > > > Confluent Platform Reference Architecture - Confluent
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.confluent.io/thank-you/resources/apache-kafka-confluent-enterprise-reference-architecture/
> > > > > > >
> > > > > > Kafka Best Practices: Build, Monitor & Optimize Kafka in
> Confluent
> > > > Cloud
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://www.confluent.io/thank-you/resources/recommendations-developers-using-confluent-cloud/
> > > > > > >
> > > > > >
> > > > > > Everybody's scenario and use case will impact how they set up
> their
> > > > > > project. You cannot look at another project and use their numbers
> > for
> > > > > your
> > > > > > own set up. That is generally a bad idea and the better answer is
> > > that
> > > > > you
> > > > > > will need to define your project objectives and then figure out
> > what
> > > is
> > > > > > needed to achieve those goals.
> > > > > >
> > > > > > The better question is to take a look at what volume throughput,
> > > > > retention
> > > > > > policy and period as well as environment and then figure out the
> > > > capacity
> > > > > > planning necessary to support what you need.
> > > > > >
> > > > > > You can achieve any performance benchmark you are willing to pay
> > > for. I
> > > > > am
> > > > > > not a fan of just blinding copying other peoples numbers and
> using
> > it
> > > > out
> > > > > > of context in benchmarks comparisons.
> > > > > >
> > > > > > Take a look at the capacity planner and sizing calculator to
> figure
> > > out
> > > > > > what hardware and infrastructure you need for your scenario
> > > > > >
> > > > > > Sizing Calculator for Apache Kafka and Confluent Platform (
> > > > eventsizer.io
> > > > > )
> > > > > > <https://eventsizer.io/>
> > > > > >
> > > > > > I hope this is more useful.
> > > > > >
> > > > > >
> > > > > > Israel Ekpo
> > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > https://izzyacademy.com/
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 6, 2022 at 6:07 PM Marisa Queen <
> > > marisa.queen...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Joris,
> > > > > > >
> > > > > > > Thank you so much, friend!
> > > > > > >
> > > > > > > > I appreciate that setting up everything on localhost will be
> > > easier
> > > > > and
> > > > > > > lead to big numbers, but bear in mind that it's typically all
> the
> > > > other
> > > > > > > real-life stuff (remote connections, replication, at-least
> once,
> > > ...)
> > > > > > that
> > > > > > > causes massive slowdowns compared to localhost
> > > > > > >
> > > > > > > Totally agree! But we must establish a ceiling first. If this
> > > > > > > super-good-loopback number doesn't look good, then one has no
> > > > business
> > > > > > > moving forward with Kafka to the more complex (and of course
> > > slower)
> > > > > > stuff.
> > > > > > >
> > > > > > > The purpose of the ceiling is that. It is your maximum ambition
> > > > > > represented
> > > > > > > by a number. You can't go any higher than that. At least with
> > > Kafka.
> > > > > > >
> > > > > > > Agree?
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > M. Queen
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jan 6, 2022 at 3:51 PM Joris Peeters <
> > > > > joris.mg.peet...@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > These tutorials - though quite a bit outdated - seem quite
> > > useful:
> > > > > > > >
> > > > http://cloudurable.com/blog/kafka-tutorial-kafka-producer/index.html
> > > > > > > (and
> > > > > > > > the follow-ups).
> > > > > > > > Ends up being close to how I write this in Java, and tutorial
> > 13
> > > > > talks
> > > > > > > > about batching and acks etc, which you'll need in order to
> tune
> > > to
> > > > > > > maximise
> > > > > > > > your throughput.
> > > > > > > >
> > > > > > > > I'm sure someone else has better example resources.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Jan 6, 2022 at 6:25 PM Marisa Queen <
> > > > > marisa.queen...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Joris,
> > > > > > > > >
> > > > > > > > > Thank you so much. I plan to write a Java Consumer and a
> Java
> > > > > > Producer,
> > > > > > > > for
> > > > > > > > > my benchmark. Do you recommend an example that I can use
> as a
> > > > > > reference
> > > > > > > > to
> > > > > > > > > write my basic Java producer and simple Java consumer? I'll
> > for
> > > > > sure
> > > > > > > > share
> > > > > > > > > the through number I get with the community. Maybe even
> > write a
> > > > > blog
> > > > > > > post
> > > > > > > > > about it. I hope it is more than 23 messages per second per
> > > > > partition
> > > > > > > > > :PPPPP
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > > > M. Queen
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Jan 6, 2022 at 2:14 PM Joris Peeters <
> > > > > > > joris.mg.peet...@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I'd just follow the instructions in
> > > > > > > > https://kafka.apache.org/quickstart
> > > > > > > > > to
> > > > > > > > > > set up Kafka and Zookeeper on a single node, by running
> the
> > > > Java
> > > > > > > > > processes
> > > > > > > > > > directly. Or can run in Docker.
> > > > > > > > > >
> > > > > > > > > > For the producer and consumer I'd personally use Python,
> as
> > > > it's
> > > > > > the
> > > > > > > > > > easiest to get going. You may want to look at
> > > > > > > > > > https://kafka-python.readthedocs.io/en/master/# (easier)
> > and
> > > > > > > > > > https://github.com/confluentinc/confluent-kafka-python
> > > > (faster).
> > > > > > > > Similar
> > > > > > > > > > things exist for Go, Java, C++, ...
> > > > > > > > > > Or I'm sure there are some benchmark setups out there
> that
> > > you
> > > > > can
> > > > > > > > tweak
> > > > > > > > > a
> > > > > > > > > > little.
> > > > > > > > > >
> > > > > > > > > > I appreciate that setting up everything on localhost will
> > be
> > > > > easier
> > > > > > > and
> > > > > > > > > > lead to big numbers, but bear in mind that it's typically
> > all
> > > > the
> > > > > > > other
> > > > > > > > > > real-life stuff (remote connections, replication,
> > > > at-least-once,
> > > > > > ...)
> > > > > > > > > that
> > > > > > > > > > causes massive slowdowns compared to localhost, and those
> > are
> > > > > > things
> > > > > > > > > banks
> > > > > > > > > > eventually tend to need (I work in finance industry
> > myself).
> > > > What
> > > > > > > > you're
> > > > > > > > > > doing is a very useful benchmark, but I'd surround it
> with
> > > the
> > > > > > above
> > > > > > > > > > caveats to avoid overpromising.
> > > > > > > > > >
> > > > > > > > > > -J
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Jan 6, 2022 at 4:58 PM Marisa Queen <
> > > > > > > marisa.queen...@gmail.com
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Joris,
> > > > > > > > > > >
> > > > > > > > > > > I've spoken to him. His answers are below:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Jan 6, 2022 at 1:37 PM Joris Peeters <
> > > > > > > > > joris.mg.peet...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > There's a few unknown parameters here that might
> > > influence
> > > > > the
> > > > > > > > > answer,
> > > > > > > > > > > > though. From the top of my head, at least
> > > > > > > > > > > > - How much replication of the data is needed (for
> high
> > > > > > > > availability),
> > > > > > > > > > and
> > > > > > > > > > > > how many acks for the producer? (If fire-and-forget
> it
> > > can
> > > > be
> > > > > > > > faster,
> > > > > > > > > > if
> > > > > > > > > > > > need to replicate and ack from 3 brokers in different
> > > DC's
> > > > > then
> > > > > > > > will
> > > > > > > > > be
> > > > > > > > > > > > slower)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Let's assume no high-availability for now, for
> > simplicity's
> > > > > sake.
> > > > > > > > > > > Fire-and-forget like he said. We don't want to
> > > overcomplicate
> > > > > > this
> > > > > > > > > simple
> > > > > > > > > > > benchmark and we want the highest possible throughput
> > > number.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > - Transactions? (If end-to-end exactly-once then
> it's a
> > > lot
> > > > > > > slower)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Again no transactions. Let's keep it simple.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > - Size of the messages? (If each message is a GB it
> > will
> > > > > > > obviously
> > > > > > > > be
> > > > > > > > > > > > slower)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Let's assume 512 bytes. Powers of two are fun!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > - Distance and bandwidth between the producers,
> Kafka &
> > > the
> > > > > > > > > consumers?
> > > > > > > > > > > (If
> > > > > > > > > > > > the network links get saturated that would limit the
> > > > > > performance.
> > > > > > > > > > Latency
> > > > > > > > > > > > is likely less important than throughput, but if your
> > > > > consumers
> > > > > > > are
> > > > > > > > > in
> > > > > > > > > > > > Tokyo and the producer in London then it will likely
> > also
> > > > be
> > > > > > > > slower)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Loopback, same machine, for the love of God. Let's not
> > even
> > > > go
> > > > > > > there.
> > > > > > > > > We
> > > > > > > > > > > want the highest possible throughput. I accept the
> limit
> > of
> > > > the
> > > > > > > speed
> > > > > > > > > of
> > > > > > > > > > > light. If network particularities, and distances, are
> to
> > be
> > > > > > > included
> > > > > > > > in
> > > > > > > > > > > this measurement then it is basically worth nothing.
> > > Loopback
> > > > > > > > > eliminates
> > > > > > > > > > > all those network variables that we surely don't want
> to
> > > > > include
> > > > > > in
> > > > > > > > the
> > > > > > > > > > > benchmark.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > FWIW, I find that the producer side is generally the
> > > > limiting
> > > > > > > > factor,
> > > > > > > > > > > > especially if there is only one.
> > > > > > > > > > > > I'd take a look at e.g. the Appendix test details on
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://docs.confluent.io/2.0.0/clients/librdkafka/INTRODUCTION_8md.html
> > > > > > > > > > > .
> > > > > > > > > > > > I
> > > > > > > > > > > > haven't yet seen a faster Kafka impl than rdkafka, so
> > > those
> > > > > > would
> > > > > > > > be
> > > > > > > > > > > > reasonable upper bounds.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your reply, Joris. Can you point me to a
> Hello
> > > > World
> > > > > > > Kafka
> > > > > > > > > > > example, so I can set up this very basic and BARE BONES
> > > Kafka
> > > > > > > system,
> > > > > > > > > > > without any of the complications you correctly
> mentioned
> > > > > above? I
> > > > > > > > have
> > > > > > > > > 10
> > > > > > > > > > > million messages that I need to send from producers to
> > > > > > consumers. I
> > > > > > > > > have
> > > > > > > > > > 1
> > > > > > > > > > > topic, 1 producer for this topic, 4 partitions for this
> > > topic
> > > > > > and 4
> > > > > > > > > > > consumers, one for each partition. Everything loopback,
> > > same
> > > > > > > machine,
> > > > > > > > > no
> > > > > > > > > > > high-availability, transactions, etc. just KAFKA BARE
> > > BONES.
> > > > > What
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > > more trivial and basic than that?
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > >
> > > > > > > > > > > M. Queen
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Jan 6, 2022 at 4:25 PM Marisa Queen <
> > > > > > > > > marisa.queen...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Israel,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Your email is great, but I'm afraid to forward it
> to
> > my
> > > > > > > customer
> > > > > > > > > > > because
> > > > > > > > > > > > it
> > > > > > > > > > > > > doesn't answer his question.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm hoping that other members from this list will
> be
> > > able
> > > > > to
> > > > > > > give
> > > > > > > > > me
> > > > > > > > > > a
> > > > > > > > > > > > more
> > > > > > > > > > > > > NUMERIC answer, let's wait to see.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Just to give you some follow up on your answer,
> when
> > > you
> > > > > say:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > 30 passengers per driver or aircraft per day may
> > not
> > > > > sound
> > > > > > > > > > impressive
> > > > > > > > > > > > but
> > > > > > > > > > > > > 750,000 passengers per day all together is how you
> > > should
> > > > > > look
> > > > > > > at
> > > > > > > > > it
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well, with this rationality one can come up with
> any
> > > > > desired
> > > > > > > > > > throughput
> > > > > > > > > > > > > number by just adding more partitions. Do you see
> my
> > > > > customer
> > > > > > > > point
> > > > > > > > > > > that
> > > > > > > > > > > > > this does not make any sense? Adding more
> partitions
> > > also
> > > > > > does
> > > > > > > > not
> > > > > > > > > > come
> > > > > > > > > > > > for
> > > > > > > > > > > > > free, because messages need to be separated into
> the
> > > > newly
> > > > > > > > created
> > > > > > > > > > > > > partition and ordering will be lost. Order is
> > important
> > > > for
> > > > > > > some
> > > > > > > > > > > > messages,
> > > > > > > > > > > > > so to keep adding more partitions towards an
> infinite
> > > > > > > throughput
> > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > > > an
> > > > > > > > > > > > > option.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've just spoken to him here, his reply was:
> > > > > > > > > > > > >
> > > > > > > > > > > > > "Marisa, I'm asking a very simple question for a
> very
> > > > basic
> > > > > > > Kafka
> > > > > > > > > > > > scenario.
> > > > > > > > > > > > > If I can't get an answer for that, then I'm in
> > trouble.
> > > > Can
> > > > > > you
> > > > > > > > > > please
> > > > > > > > > > > > find
> > > > > > > > > > > > > out with your peers/community what is a good
> > throughput
> > > > > > number
> > > > > > > to
> > > > > > > > > > have
> > > > > > > > > > > in
> > > > > > > > > > > > > mind for the scenario I've been describing. Again
> it
> > > is a
> > > > > > very
> > > > > > > > > basic
> > > > > > > > > > > and
> > > > > > > > > > > > > simple scenario: I have 10 million messages that I
> > need
> > > > to
> > > > > > send
> > > > > > > > > from
> > > > > > > > > > > > > producers to consumers. Let's assume I have 1
> topic,
> > 1
> > > > > > producer
> > > > > > > > for
> > > > > > > > > > > this
> > > > > > > > > > > > > topic, 4 partitions for this topic and 4 consumers,
> > one
> > > > for
> > > > > > > each
> > > > > > > > > > > > partition.
> > > > > > > > > > > > > What I would like to know is: How long is it going
> to
> > > > take
> > > > > > for
> > > > > > > > > these
> > > > > > > > > > 10
> > > > > > > > > > > > > million messages to travel all the way from the
> > > producer
> > > > to
> > > > > > the
> > > > > > > > > > > > consumers?
> > > > > > > > > > > > > That's the throughput performance number I'm
> > interested
> > > > > in."
> > > > > > > > > > > > >
> > > > > > > > > > > > > I surely won't tell him: "Hey, that's easy, you
> have
> > 4
> > > > > > > > partitions,
> > > > > > > > > > each
> > > > > > > > > > > > > partition according to LinkedIn can handle 23
> > messages
> > > > per
> > > > > > > > second,
> > > > > > > > > so
> > > > > > > > > > > we
> > > > > > > > > > > > > are looking for a 92 messages per second throughput
> > > > here!"
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > >
> > > > > > > > > > > > > M. Queen
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Jan 6, 2022 at 12:58 PM Israel Ekpo <
> > > > > > > > israele...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Marisa
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think there may be some confusion about the
> > > > throughput
> > > > > > for
> > > > > > > > each
> > > > > > > > > > > > > partition
> > > > > > > > > > > > > > and I want to explain briefly using some
> analogies
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Using transportation for example if we were to
> pick
> > > an
> > > > > > > airline
> > > > > > > > or
> > > > > > > > > > > > > > ridesharing organization to describe the volume
> of
> > > > > > customers
> > > > > > > > they
> > > > > > > > > > can
> > > > > > > > > > > > > > support per day we would have to look at how many
> > > total
> > > > > > > > customers
> > > > > > > > > > can
> > > > > > > > > > > > > > American Airlines service in a day or how many
> > > > customers
> > > > > > can
> > > > > > > > Uber
> > > > > > > > > > or
> > > > > > > > > > > > Lyft
> > > > > > > > > > > > > > serve in a day. We would not zero in on only the
> > > number
> > > > > of
> > > > > > > > > > customers
> > > > > > > > > > > a
> > > > > > > > > > > > > > particular driver can service or the number of
> > > > passengers
> > > > > > are
> > > > > > > > > > > > particular
> > > > > > > > > > > > > > aircraft than service in a day. That would be
> very
> > > > > limiting
> > > > > > > > > > > considering
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > hundreds of thousands of aircrafts or drivers
> > > actively
> > > > > > > > > transporting
> > > > > > > > > > > > > > passengers in real time.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 30 passengers per driver or aircraft per day may
> > not
> > > > > sound
> > > > > > > > > > impressive
> > > > > > > > > > > > but
> > > > > > > > > > > > > > 750,000 passengers per day all together is how
> you
> > > > should
> > > > > > > look
> > > > > > > > at
> > > > > > > > > > it
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Partitions in Kafka are just a logical unit for
> > > > > organizing
> > > > > > > and
> > > > > > > > > > > storing
> > > > > > > > > > > > > data
> > > > > > > > > > > > > > within a Kafka topic. You should not base your
> > > analysis
> > > > > on
> > > > > > > just
> > > > > > > > > > what
> > > > > > > > > > > a
> > > > > > > > > > > > > > subunit of storage is able to support.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would recommend taking a look at Kafka Summit
> > talks
> > > > on
> > > > > > > > > > performance
> > > > > > > > > > > > and
> > > > > > > > > > > > > > benchmarks to get some understanding how what
> Kafka
> > > is
> > > > > able
> > > > > > > to
> > > > > > > > do
> > > > > > > > > > and
> > > > > > > > > > > > the
> > > > > > > > > > > > > > applicable use cases in the Financial Services
> > > industry
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > A lot of reputable organizations already trust
> > Kafka
> > > > > today
> > > > > > > for
> > > > > > > > > > their
> > > > > > > > > > > > > needs
> > > > > > > > > > > > > > so this is already proven
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > https://kafka.apache.org/powered-by
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I hope this helps.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Israel Ekpo
> > > > > > > > > > > > > > Lead Instructor, IzzyAcademy.com
> > > > > > > > > > > > > > https://www.youtube.com/c/izzyacademy
> > > > > > > > > > > > > > https://izzyacademy.com/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 6, 2022 at 10:01 AM Marisa Queen <
> > > > > > > > > > > > marisa.queen...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers from NYC!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I'm trying to give a performance number to a
> > > > potential
> > > > > > > client
> > > > > > > > > > (from
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > financial market) who asked me the following
> > > > question:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *"If I have a Kafka system setup in the best
> way
> > > > > possible
> > > > > > > for
> > > > > > > > > > > > > > performance,
> > > > > > > > > > > > > > > what is an approximate number that I can have
> in
> > > mind
> > > > > for
> > > > > > > the
> > > > > > > > > > > > > throughput
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > this system?"*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The client proceeded to say:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *"What I want to know specifically, is how many
> > > > > messages
> > > > > > > per
> > > > > > > > > > second
> > > > > > > > > > > > > can I
> > > > > > > > > > > > > > > send from one side of my distributed system to
> > the
> > > > > other
> > > > > > > side
> > > > > > > > > > with
> > > > > > > > > > > > > Apache
> > > > > > > > > > > > > > > Kafka."*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > And he concluded with:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *"To give you an example, let's say I have 10
> > > million
> > > > > > > > messages
> > > > > > > > > > > that I
> > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > to send from producers to consumers. Let's
> > assume I
> > > > > have
> > > > > > 1
> > > > > > > > > > topic, 1
> > > > > > > > > > > > > > > producer for this topic, 4 partitions for this
> > > topic
> > > > > and
> > > > > > 4
> > > > > > > > > > > consumers,
> > > > > > > > > > > > > one
> > > > > > > > > > > > > > > for each partition. What I would like to know
> is:
> > > How
> > > > > > long
> > > > > > > is
> > > > > > > > > it
> > > > > > > > > > > > going
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > take for these 10 million messages to travel
> all
> > > the
> > > > > way
> > > > > > > from
> > > > > > > > > the
> > > > > > > > > > > > > > producer
> > > > > > > > > > > > > > > to the consumers? That's the throughput
> > performance
> > > > > > number
> > > > > > > > I'm
> > > > > > > > > > > > > interested
> > > > > > > > > > > > > > > in."*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I read in a reddit post yesterday (for some
> > reason
> > > I
> > > > > > can't
> > > > > > > > find
> > > > > > > > > > the
> > > > > > > > > > > > > post
> > > > > > > > > > > > > > > anymore) that Kafka is able to handle 7
> trillion
> > > > > messages
> > > > > > > per
> > > > > > > > > > day.
> > > > > > > > > > > > The
> > > > > > > > > > > > > > > LinkedIn article about it, says:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *"We maintain over 100 Kafka clusters with more
> > > than
> > > > > > 4,000
> > > > > > > > > > brokers,
> > > > > > > > > > > > > which
> > > > > > > > > > > > > > > serve more than 100,000 topics and 7 million
> > > > > partitions.
> > > > > > > The
> > > > > > > > > > total
> > > > > > > > > > > > > number
> > > > > > > > > > > > > > > of messages handled by LinkedIn’s Kafka
> > deployments
> > > > > > > recently
> > > > > > > > > > > > surpassed
> > > > > > > > > > > > > 7
> > > > > > > > > > > > > > > trillion per day."*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The OP of the reddit post went on to say that
> > > > WhatsApp
> > > > > is
> > > > > > > > > > handling
> > > > > > > > > > > > > around
> > > > > > > > > > > > > > > 64 billion messages per day (740,000 msgs per
> > sec x
> > > > 24
> > > > > x
> > > > > > > 60 x
> > > > > > > > > 60)
> > > > > > > > > > > and
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > 7
> > > > > > > > > > > > > > > trillion for LinkedIn is a huge number, giving
> a
> > > > > whopping
> > > > > > > 81
> > > > > > > > > > > million
> > > > > > > > > > > > > > > messages per second for LinkedIn. But that
> > doesn't
> > > > > matter
> > > > > > > for
> > > > > > > > > my
> > > > > > > > > > > > > > question.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 7 Trillion messages divided by 7 million
> > partitions
> > > > > gives
> > > > > > > us
> > > > > > > > 1
> > > > > > > > > > > > million
> > > > > > > > > > > > > > > messages per day per partition. So to calculate
> > the
> > > > > > > > throughput
> > > > > > > > > we
> > > > > > > > > > > do:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >     1 million divided by 60 divided by 60
> divided
> > > by
> > > > 24
> > > > > > =>
> > > > > > > > *23
> > > > > > > > > > > > messages
> > > > > > > > > > > > > > per
> > > > > > > > > > > > > > > second per partition*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We'll all agree that 23 messages per second per
> > > > > partition
> > > > > > > for
> > > > > > > > > > > > > throughput
> > > > > > > > > > > > > > > performance is very low, so I can't give this
> > > number
> > > > to
> > > > > > my
> > > > > > > > > > > potential
> > > > > > > > > > > > > > > client.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So my question is: *What number should I give
> to
> > my
> > > > > > > potential
> > > > > > > > > > > > client?*
> > > > > > > > > > > > > > Note
> > > > > > > > > > > > > > > that he is a stubborn and strict bank CTO, so
> he
> > > > won't
> > > > > > take
> > > > > > > > any
> > > > > > > > > > > talk
> > > > > > > > > > > > > from
> > > > > > > > > > > > > > > me. He wants a mathematical answer using the
> > > > scientific
> > > > > > > > method.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Has anyone been in my shoes and can shed some
> > light
> > > > on
> > > > > > this
> > > > > > > > > kafka
> > > > > > > > > > > > > > > throughput performance topic?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > M. Queen
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > --
> > > > Israel Ekpo
> > > > Lead Instructor, IzzyAcademy.com
> > > > https://www.youtube.com/c/izzyacademy
> > > > https://izzyacademy.com/
> > > >
> > >
> >
>

Re: Kafka performance when it comes to throughput

Reply via email to