Re: Kafka Scaling Ideas

2020-12-22 Thread Haruki Okada
Hm, it's an optimization for "first layer", so if the bottleneck is in "second layer" (i.e. DB write) as you mentioned, it shouldn't make much difference I think. 2020年12月22日(火) 16:02 Yana K : > I thought about it but then we don't have much time - will it optimize > performance? > > On Mon, Dec

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
I thought about it but then we don't have much time - will it optimize performance? On Mon, Dec 21, 2020 at 4:16 PM Haruki Okada wrote: > About "first layer" right? > Then it's better to make sure that not get() the result of Producer#send() > for each message, because in that way, it spoils

Re: Kafka Scaling Ideas

2020-12-21 Thread Haruki Okada
About "first layer" right? Then it's better to make sure that not get() the result of Producer#send() for each message, because in that way, it spoils the ability of producer-batching. Kafka producer batches messages by default and it's very efficient, so if you produce in async way, it rarely

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
Thanks! Also are there any producer optimizations anyone can think of in this scenario? On Mon, Dec 21, 2020 at 8:58 AM Joris Peeters wrote: > I'd probably just do it by experiment for your concrete data. > > Maybe generate a few million synthetic data rows, and for-each-batch insert > them

Re: Kafka Scaling Ideas

2020-12-21 Thread Joris Peeters
I'd probably just do it by experiment for your concrete data. Maybe generate a few million synthetic data rows, and for-each-batch insert them into a dev DB, with an outer grid search over various candidate batch sizes. You're looking to optimise for flat-out rows/s, so whichever batch size wins

Re: Kafka Scaling Ideas

2020-12-21 Thread Yana K
Thanks Haruki and Joris. Haruki: Thanks for the detailed calculations. Really appreciate it. What tool/lib is used to load test kafka? So we've one consumer group and running 7 instances of the application - that should be good enough - correct? Joris: Great point. DB insert is a bottleneck (and

Re: Kafka Scaling Ideas

2020-12-21 Thread Joris Peeters
Do you know why your consumers are so slow? 12E6msg/hour is msg/s, which is not very high from a Kafka point-of-view. As you're doing database inserts, I suspect that is where the bottleneck lies. If, for example, you're doing a single-row insert in a SQL DB for every message then this would

Re: Kafka Scaling Ideas

2020-12-21 Thread Haruki Okada
About load test: I think it'd be better to monitor per-message process latency and estimate required partition count based on it because it determines the max throughput per single partition. - Say you have to process 12 million messages/hour = messages/sec . - If you have 7 partitions (thus

Re: Kafka Scaling Ideas

2020-12-20 Thread Yana K
So as the next step I see to increase the partition of the 2nd topic - do I increase the instances of the consumer from that or keep it at 7? Anything else (besides researching those libs)? Are there any good tools for load testing kafka? On Sun, Dec 20, 2020 at 7:23 PM Haruki Okada wrote: >

Re: Kafka Scaling Ideas

2020-12-20 Thread Haruki Okada
It depends on how you manually commit offsets. Auto-commit does commits offsets in async manner basically, so as long as you do manual-commit in the same way, there should be no much difference. And, generally offset-commit mode doesn't make much difference in performance regardless manual/auto

Re: Kafka Scaling Ideas

2020-12-20 Thread Yana K
Thank you so much Marina and Haruka. Marina's response: - When you say " if you are sure there is no room for perf optimization of the processing itself :" - do you mean code level optimizations? Can you please explain? - On the second topic you say " I'd say at least 40" - is this based on 12

Re: Kafka Scaling Ideas

2020-12-19 Thread Haruki Okada
Hi. Yeah, Spring-Kafka does processing messages sequentially, so the consumer throughput would be capped by database latency per single process. One possible solution is creating an intermediate topic (or altering source topic) with much more partitions as Marina suggested. I'd like to suggest

Re: Kafka Scaling Ideas

2020-12-19 Thread Marina Popova
The way I see it - you can only do a few things - if you are sure there is no room for perf optimization of the processing itself : 1. speed up your processing per consumer thread: which you already tried by splitting your logic into a 2-step pipeline instead of 1-step, and delegating the work

Kafka Scaling Ideas

2020-12-19 Thread Yana K
Hi I am new to the Kafka world and running into this scale problem. I thought of reaching out to the community if someone can help. So the problem is I am trying to consume from a Kafka topic that can have a peak of 12 million messages/hour. That topic is not under my control - it has 7