The way I see it - you can only do a few things - if you are sure there is no 
room for perf optimization of the processing itself :
1. speed up your processing per consumer thread: which you already tried by 
splitting your logic into a 2-step pipeline instead of 1-step, and delegating 
the work of writing to a DB to the second step ( make sure your second 
intermediate Kafka topic is created with much more partitions to be able to 
parallelize your work much higher - I'd say at least 40)
2. if you can change the incoming topic - I would create it with many more 
partitions as well - say at least 40 or so - to parallelize your first step 
service processing more
3. and if you can't increase partitions for the original topic ) - you could 
artificially achieve the same by adding one more step (service) in your 
pipeline that would just read data from the original 7-partition topic1 and 
just push it unchanged into a new topic2 with , say 40 partitions - and then 
have your other services pick up from this topic2


good luck,
Marina

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, December 19, 2020 6:46 PM, Yana K <yanak1...@gmail.com> wrote:

> Hi
>
> I am new to the Kafka world and running into this scale problem. I thought
> of reaching out to the community if someone can help.
> So the problem is I am trying to consume from a Kafka topic that can have a
> peak of 12 million messages/hour. That topic is not under my control - it
> has 7 partitions and sending json payload.
> I have written a consumer (I've used Java and Spring-Kafka lib) that will
> read that data, filter it and then load it into a database. I ran into a
> huge consumer lag that would take 10-12hours to catch up. I have 7
> instances of my application running to match the 7 partitions and I am
> using auto commit. Then I thought of splitting the write logic to a
> separate layer. So now my architecture has a component that reads and
> filters and produces the data to an internal topic (I've done 7 partitions
> but as you see it's under my control). Then a consumer picks up data from
> that topic and writes it to the database. It's better but still it takes
> 3-5hours for the consumer lag to catch up.
> Am I missing something fundamentally? Are there any other ideas for
> optimization that can help overcome this scale challenge. Any pointer and
> article will help too.
>
> Appreciate your help with this.
>
> Thanks
> Yana


Reply via email to