Hi.

Yeah, Spring-Kafka does processing messages sequentially, so the consumer
throughput would be capped by database latency per single process.
One possible solution is creating an intermediate topic (or altering source
topic) with much more partitions as Marina suggested.

I'd like to suggest another solution, that is multi-threaded processing per
single partition.
Decaton (https://github.com/line/decaton) is a library to achieve it.

Also confluent has published a blog post about parallel-consumer (
https://www.confluent.io/blog/introducing-confluent-parallel-message-processing-client/)
for that purpose, but it seems it's still in the BETA stage.

2020年12月20日(日) 11:41 Marina Popova <ppine7...@protonmail.com.invalid>:

> The way I see it - you can only do a few things - if you are sure there is
> no room for perf optimization of the processing itself :
> 1. speed up your processing per consumer thread: which you already tried
> by splitting your logic into a 2-step pipeline instead of 1-step, and
> delegating the work of writing to a DB to the second step ( make sure your
> second intermediate Kafka topic is created with much more partitions to be
> able to parallelize your work much higher - I'd say at least 40)
> 2. if you can change the incoming topic - I would create it with many more
> partitions as well - say at least 40 or so - to parallelize your first step
> service processing more
> 3. and if you can't increase partitions for the original topic ) - you
> could artificially achieve the same by adding one more step (service) in
> your pipeline that would just read data from the original 7-partition
> topic1 and just push it unchanged into a new topic2 with , say 40
> partitions - and then have your other services pick up from this topic2
>
>
> good luck,
> Marina
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Saturday, December 19, 2020 6:46 PM, Yana K <yanak1...@gmail.com>
> wrote:
>
> > Hi
> >
> > I am new to the Kafka world and running into this scale problem. I
> thought
> > of reaching out to the community if someone can help.
> > So the problem is I am trying to consume from a Kafka topic that can
> have a
> > peak of 12 million messages/hour. That topic is not under my control - it
> > has 7 partitions and sending json payload.
> > I have written a consumer (I've used Java and Spring-Kafka lib) that will
> > read that data, filter it and then load it into a database. I ran into a
> > huge consumer lag that would take 10-12hours to catch up. I have 7
> > instances of my application running to match the 7 partitions and I am
> > using auto commit. Then I thought of splitting the write logic to a
> > separate layer. So now my architecture has a component that reads and
> > filters and produces the data to an internal topic (I've done 7
> partitions
> > but as you see it's under my control). Then a consumer picks up data from
> > that topic and writes it to the database. It's better but still it takes
> > 3-5hours for the consumer lag to catch up.
> > Am I missing something fundamentally? Are there any other ideas for
> > optimization that can help overcome this scale challenge. Any pointer and
> > article will help too.
> >
> > Appreciate your help with this.
> >
> > Thanks
> > Yana
>
>
>

-- 
========================
Okada Haruki
ocadar...@gmail.com
========================

Reply via email to