So as the next step I see to increase the partition of the 2nd topic - do I increase the instances of the consumer from that or keep it at 7? Anything else (besides researching those libs)?
Are there any good tools for load testing kafka? On Sun, Dec 20, 2020 at 7:23 PM Haruki Okada <ocadar...@gmail.com> wrote: > It depends on how you manually commit offsets. > Auto-commit does commits offsets in async manner basically, so as long as > you do manual-commit in the same way, there should be no much difference. > > And, generally offset-commit mode doesn't make much difference in > performance regardless manual/auto or async/sync unless offset-commit > latency takes significant amount in processing time (e.g. you commit > offsets synchronously in every poll() loop). > > 2020年12月21日(月) 11:08 Yana K <yanak1...@gmail.com>: > > > Thank you so much Marina and Haruka. > > > > Marina's response: > > - When you say " if you are sure there is no room for perf optimization > of > > the processing itself :" - do you mean code level optimizations? Can you > > please explain? > > - On the second topic you say " I'd say at least 40" - is this based on > 12 > > million records / hour? > > - "if you can change the incoming topic" - I don't think it is possible > :( > > - "you could artificially achieve the same by adding one more step > > (service) in your pipeline" - this is the next thing - but I want to be > > sure this will help, given we've to maintain one more layer > > > > Haruka's response: > > - "One possible solution is creating an intermediate topic" - I already > did > > it > > - I'll look at Decaton - thx > > > > Is there any thoughts on the auto commit vs manual commit - if it can > > better the performance while consuming? > > > > Yana > > > > > > > > On Sat, Dec 19, 2020 at 7:01 PM Haruki Okada <ocadar...@gmail.com> > wrote: > > > > > Hi. > > > > > > Yeah, Spring-Kafka does processing messages sequentially, so the > consumer > > > throughput would be capped by database latency per single process. > > > One possible solution is creating an intermediate topic (or altering > > source > > > topic) with much more partitions as Marina suggested. > > > > > > I'd like to suggest another solution, that is multi-threaded processing > > per > > > single partition. > > > Decaton (https://github.com/line/decaton) is a library to achieve it. > > > > > > Also confluent has published a blog post about parallel-consumer ( > > > > > > > > > https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/ > > > ) > > > for that purpose, but it seems it's still in the BETA stage. > > > > > > 2020年12月20日(日) 11:41 Marina Popova <ppine7...@protonmail.com.invalid>: > > > > > > > The way I see it - you can only do a few things - if you are sure > there > > > is > > > > no room for perf optimization of the processing itself : > > > > 1. speed up your processing per consumer thread: which you already > > tried > > > > by splitting your logic into a 2-step pipeline instead of 1-step, and > > > > delegating the work of writing to a DB to the second step ( make sure > > > your > > > > second intermediate Kafka topic is created with much more partitions > to > > > be > > > > able to parallelize your work much higher - I'd say at least 40) > > > > 2. if you can change the incoming topic - I would create it with many > > > more > > > > partitions as well - say at least 40 or so - to parallelize your > first > > > step > > > > service processing more > > > > 3. and if you can't increase partitions for the original topic ) - > you > > > > could artificially achieve the same by adding one more step (service) > > in > > > > your pipeline that would just read data from the original 7-partition > > > > topic1 and just push it unchanged into a new topic2 with , say 40 > > > > partitions - and then have your other services pick up from this > topic2 > > > > > > > > > > > > good luck, > > > > Marina > > > > > > > > Sent with ProtonMail Secure Email. > > > > > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Saturday, December 19, 2020 6:46 PM, Yana K <yanak1...@gmail.com> > > > > wrote: > > > > > > > > > Hi > > > > > > > > > > I am new to the Kafka world and running into this scale problem. I > > > > thought > > > > > of reaching out to the community if someone can help. > > > > > So the problem is I am trying to consume from a Kafka topic that > can > > > > have a > > > > > peak of 12 million messages/hour. That topic is not under my > control > > - > > > it > > > > > has 7 partitions and sending json payload. > > > > > I have written a consumer (I've used Java and Spring-Kafka lib) > that > > > will > > > > > read that data, filter it and then load it into a database. I ran > > into > > > a > > > > > huge consumer lag that would take 10-12hours to catch up. I have 7 > > > > > instances of my application running to match the 7 partitions and I > > am > > > > > using auto commit. Then I thought of splitting the write logic to a > > > > > separate layer. So now my architecture has a component that reads > and > > > > > filters and produces the data to an internal topic (I've done 7 > > > > partitions > > > > > but as you see it's under my control). Then a consumer picks up > data > > > from > > > > > that topic and writes it to the database. It's better but still it > > > takes > > > > > 3-5hours for the consumer lag to catch up. > > > > > Am I missing something fundamentally? Are there any other ideas for > > > > > optimization that can help overcome this scale challenge. Any > pointer > > > and > > > > > article will help too. > > > > > > > > > > Appreciate your help with this. > > > > > > > > > > Thanks > > > > > Yana > > > > > > > > > > > > > > > > > > -- > > > ======================== > > > Okada Haruki > > > ocadar...@gmail.com > > > ======================== > > > > > > > > -- > ======================== > Okada Haruki > ocadar...@gmail.com > ======================== >