Hi

I am new to the Kafka world and running into this scale problem. I thought
of reaching out to the community if someone can help.
So the problem is I am trying to consume from a Kafka topic that can have a
peak of 12 million messages/hour. That topic is not under my control - it
has 7 partitions and sending json payload.
I have written a consumer (I've used Java and Spring-Kafka lib) that will
read that data, filter it and then load it into a database. I ran into a
huge consumer lag that would take 10-12hours to catch up. I have 7
instances of my application running to match the 7 partitions and I am
using auto commit. Then I thought of splitting the write logic to a
separate layer. So now my architecture has a component that reads and
filters and produces the data to an internal topic (I've done 7 partitions
but as you see it's under my control). Then a consumer picks up data from
that topic and writes it to the database. It's better but still it takes
3-5hours for the consumer lag to catch up.
Am I missing something fundamentally? Are there any other ideas for
optimization that can help overcome this scale challenge. Any pointer and
article will help too.

Appreciate your help with this.

Thanks
Yana

Reply via email to