Hm, it's an optimization for "first layer", so if the bottleneck is in
"second layer" (i.e. DB write) as you mentioned, it shouldn't make much
difference I think.
2020年12月22日(火) 16:02 Yana K :
> I thought about it but then we don't have much time - will it optimize
> performance?
>
> On Mon, Dec
I thought about it but then we don't have much time - will it optimize
performance?
On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada wrote:
> About "first layer" right?
> Then it's better to make sure that not get() the result of Producer#send()
> for each message, because in that way, it spoils
About "first layer" right?
Then it's better to make sure that not get() the result of Producer#send()
for each message, because in that way, it spoils the ability of
producer-batching.
Kafka producer batches messages by default and it's very efficient, so if
you produce in async way, it rarely
Thanks!
Also are there any producer optimizations anyone can think of in this
scenario?
On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters
wrote:
> I'd probably just do it by experiment for your concrete data.
>
> Maybe generate a few million synthetic data rows, and for-each-batch insert
> them
I'd probably just do it by experiment for your concrete data.
Maybe generate a few million synthetic data rows, and for-each-batch insert
them into a dev DB, with an outer grid search over various candidate batch
sizes. You're looking to optimise for flat-out rows/s, so whichever batch
size wins
Thanks Haruki and Joris.
Haruki:
Thanks for the detailed calculations. Really appreciate it. What tool/lib
is used to load test kafka?
So we've one consumer group and running 7 instances of the application -
that should be good enough - correct?
Joris:
Great point.
DB insert is a bottleneck (and
Do you know why your consumers are so slow? 12E6msg/hour is msg/s,
which is not very high from a Kafka point-of-view. As you're doing database
inserts, I suspect that is where the bottleneck lies.
If, for example, you're doing a single-row insert in a SQL DB for every
message then this would
About load test:
I think it'd be better to monitor per-message process latency and estimate
required partition count based on it because it determines the max
throughput per single partition.
- Say you have to process 12 million messages/hour = messages/sec .
- If you have 7 partitions (thus
So as the next step I see to increase the partition of the 2nd topic - do I
increase the instances of the consumer from that or keep it at 7?
Anything else (besides researching those libs)?
Are there any good tools for load testing kafka?
On Sun, Dec 20, 2020 at 7:23 PM Haruki Okada wrote:
>
It depends on how you manually commit offsets.
Auto-commit does commits offsets in async manner basically, so as long as
you do manual-commit in the same way, there should be no much difference.
And, generally offset-commit mode doesn't make much difference in
performance regardless manual/auto
Thank you so much Marina and Haruka.
Marina's response:
- When you say " if you are sure there is no room for perf optimization of
the processing itself :" - do you mean code level optimizations? Can you
please explain?
- On the second topic you say " I'd say at least 40" - is this based on 12
Hi.
Yeah, Spring-Kafka does processing messages sequentially, so the consumer
throughput would be capped by database latency per single process.
One possible solution is creating an intermediate topic (or altering source
topic) with much more partitions as Marina suggested.
I'd like to suggest
The way I see it - you can only do a few things - if you are sure there is no
room for perf optimization of the processing itself :
1. speed up your processing per consumer thread: which you already tried by
splitting your logic into a 2-step pipeline instead of 1-step, and delegating
the work
Hi
I am new to the Kafka world and running into this scale problem. I thought
of reaching out to the community if someone can help.
So the problem is I am trying to consume from a Kafka topic that can have a
peak of 12 million messages/hour. That topic is not under my control - it
has 7
14 matches
Mail list logo