Thank you so much Marina and Haruka. Marina's response: - When you say " if you are sure there is no room for perf optimization of the processing itself :" - do you mean code level optimizations? Can you please explain? - On the second topic you say " I'd say at least 40" - is this based on 12 million records / hour? - "if you can change the incoming topic" - I don't think it is possible :( - "you could artificially achieve the same by adding one more step (service) in your pipeline" - this is the next thing - but I want to be sure this will help, given we've to maintain one more layer
Haruka's response: - "One possible solution is creating an intermediate topic" - I already did it - I'll look at Decaton - thx Is there any thoughts on the auto commit vs manual commit - if it can better the performance while consuming? Yana On Sat, Dec 19, 2020 at 7:01 PM Haruki Okada <ocadar...@gmail.com> wrote: > Hi. > > Yeah, Spring-Kafka does processing messages sequentially, so the consumer > throughput would be capped by database latency per single process. > One possible solution is creating an intermediate topic (or altering source > topic) with much more partitions as Marina suggested. > > I'd like to suggest another solution, that is multi-threaded processing per > single partition. > Decaton (https://github.com/line/decaton) is a library to achieve it. > > Also confluent has published a blog post about parallel-consumer ( > > https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/ > ) > for that purpose, but it seems it's still in the BETA stage. > > 2020年12月20日(日) 11:41 Marina Popova <ppine7...@protonmail.com.invalid>: > > > The way I see it - you can only do a few things - if you are sure there > is > > no room for perf optimization of the processing itself : > > 1. speed up your processing per consumer thread: which you already tried > > by splitting your logic into a 2-step pipeline instead of 1-step, and > > delegating the work of writing to a DB to the second step ( make sure > your > > second intermediate Kafka topic is created with much more partitions to > be > > able to parallelize your work much higher - I'd say at least 40) > > 2. if you can change the incoming topic - I would create it with many > more > > partitions as well - say at least 40 or so - to parallelize your first > step > > service processing more > > 3. and if you can't increase partitions for the original topic ) - you > > could artificially achieve the same by adding one more step (service) in > > your pipeline that would just read data from the original 7-partition > > topic1 and just push it unchanged into a new topic2 with , say 40 > > partitions - and then have your other services pick up from this topic2 > > > > > > good luck, > > Marina > > > > Sent with ProtonMail Secure Email. > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Saturday, December 19, 2020 6:46 PM, Yana K <yanak1...@gmail.com> > > wrote: > > > > > Hi > > > > > > I am new to the Kafka world and running into this scale problem. I > > thought > > > of reaching out to the community if someone can help. > > > So the problem is I am trying to consume from a Kafka topic that can > > have a > > > peak of 12 million messages/hour. That topic is not under my control - > it > > > has 7 partitions and sending json payload. > > > I have written a consumer (I've used Java and Spring-Kafka lib) that > will > > > read that data, filter it and then load it into a database. I ran into > a > > > huge consumer lag that would take 10-12hours to catch up. I have 7 > > > instances of my application running to match the 7 partitions and I am > > > using auto commit. Then I thought of splitting the write logic to a > > > separate layer. So now my architecture has a component that reads and > > > filters and produces the data to an internal topic (I've done 7 > > partitions > > > but as you see it's under my control). Then a consumer picks up data > from > > > that topic and writes it to the database. It's better but still it > takes > > > 3-5hours for the consumer lag to catch up. > > > Am I missing something fundamentally? Are there any other ideas for > > > optimization that can help overcome this scale challenge. Any pointer > and > > > article will help too. > > > > > > Appreciate your help with this. > > > > > > Thanks > > > Yana > > > > > > > > -- > ======================== > Okada Haruki > ocadar...@gmail.com > ======================== >