Re: multi-threaded consumer configuration like stream threads?

Pushkar Deole Mon, 23 Nov 2020 23:06:51 -0800

I think we can handle the failures selectively e.g. if there are issues
with downstream database server then all the events will fail to process so
it will be worth to keep retrying. Else if there is issue only while
processing a particular event, then we can keep retry timeout and after
that timeout it will take another event for processing .


On Tue, Nov 24, 2020 at 5:35 AM Haruki Okada <ocadar...@gmail.com> wrote:

> I see.
>
> Then I think the appropriate approach depends on your delivery latency
> requirements.
> Just retrying until success is simpler but it could block subsequent
> messages to get processed. (also depends on thread pool size though)
>
> Then another concern when using dead letter topic would be retrying
> backoff.
> If you don't use backoff, the production for dead letter topic could burst
> when downstream db experiences transient problems but on the other hand
> injecting backoff-delay would require consideration about how to not block
> subsequent messages.
>
> (FYI, Decaton provides retry-queueing with backoff out-of-the box. :)
> https://github.com/line/decaton/blob/master/docs/retry-queueing.adoc)
>
> 2020年11月24日(火) 2:38 Pushkar Deole <pdeole2...@gmail.com>:
>
> > Thanks Haruki... right now the max of such types of events that we would
> > have is 100 since we would be supporting those many customers (accounts)
> > for now, for which we are considering a simple approach of a single
> > consumer and a thread pool with around 10 threads. So the question was
> > regarding how to manage failed events, should those be retried until
> > successful or sent to a dead letter queue/topic from where they will be
> > processed again until successful.
> >
> >
> > On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <ocadar...@gmail.com>
> wrote:
> >
> > > Hi Pushkar.
> > >
> > > Just for your information, https://github.com/line/decaton is a Kafka
> > > consumer framework that supports parallel processing per single
> > partition.
> > >
> > > It manages committable (i.e. the offset that all preceding offsets have
> > > been processed) offset internally so that preserves at-least-once
> > semantics
> > > even when processing in parallel.
> > >
> > >
> > > 2020年11月24日(火) 1:16 Pushkar Deole <pdeole2...@gmail.com>:
> > >
> > > > Thanks Liam!
> > > > We don't have a requirement to maintain order of processing for
> events
> > > even
> > > > within a partition. Essentially, these are events for various
> accounts
> > > > (customers) that we want to support and do necessary database
> > > provisioning
> > > > for those in our database. So they can be processed in parallel.
> > > >
> > > > I think the 2nd option would suit our requirement to have a single
> > > consumer
> > > > and a bound thread pool for processors. However, the issue we may
> face
> > is
> > > > to commit the offsets only after processing an event since we don't
> > want
> > > > the consumer to auto commit offsets before the provisioning done for
> > the
> > > > customer. How can that be achieved with model #2  ?
> > > >
> > > > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > > > liam.cla...@adscale.co.nz> wrote:
> > > >
> > > > > Hi Pushkar,
> > > > >
> > > > > No. You'd need to combine a consumer with a thread pool or similar
> as
> > > you
> > > > > prefer. As the docs say (from
> > > > >
> > > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > > > )
> > > > >
> > > > > We have intentionally avoided implementing a particular threading
> > model
> > > > for
> > > > > > processing. This leaves several options for implementing
> > > multi-threaded
> > > > > > processing of records.
> > > > > > 1. One Consumer Per Thread
> > > > > > A simple option is to give each thread its own consumer instance.
> > > Here
> > > > > are
> > > > > > the pros and cons of this approach:
> > > > > >
> > > > > >    - *PRO*: It is the easiest to implement
> > > > > >
> > > > > >
> > > > > >    - *PRO*: It is often the fastest as no inter-thread
> > co-ordination
> > > is
> > > > > >    needed
> > > > > >
> > > > > >
> > > > > >    - *PRO*: It makes in-order processing on a per-partition basis
> > > very
> > > > > >    easy to implement (each thread just processes messages in the
> > > order
> > > > it
> > > > > >    receives them).
> > > > > >
> > > > > >
> > > > > >    - *CON*: More consumers means more TCP connections to the
> > cluster
> > > > (one
> > > > > >    per thread). In general Kafka handles connections very
> > efficiently
> > > > so
> > > > > this
> > > > > >    is generally a small cost.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Multiple consumers means more requests being sent to
> > the
> > > > > >    server and slightly less batching of data which can cause some
> > > drop
> > > > > in I/O
> > > > > >    throughput.
> > > > > >
> > > > > >
> > > > > >    - *CON*: The number of total threads across all processes will
> > be
> > > > > >    limited by the total number of partitions.
> > > > > >
> > > > > > 2. Decouple Consumption and Processing
> > > > > > Another alternative is to have one or more consumer threads that
> do
> > > all
> > > > > > data consumption and hands off ConsumerRecords
> > > > > > <
> > > > >
> > > >
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > > > >
> > > > > instances
> > > > > > to a blocking queue consumed by a pool of processor threads that
> > > > actually
> > > > > > handle the record processing. This option likewise has pros and
> > cons:
> > > > > >
> > > > > >    - *PRO*: This option allows independently scaling the number
> of
> > > > > >    consumers and processors. This makes it possible to have a
> > single
> > > > > consumer
> > > > > >    that feeds many processor threads, avoiding any limitation on
> > > > > partitions.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Guaranteeing order across the processors requires
> > > > particular
> > > > > >    care as the threads will execute independently an earlier
> chunk
> > of
> > > > > data may
> > > > > >    actually be processed after a later chunk of data just due to
> > the
> > > > > luck of
> > > > > >    thread execution timing. For processing that has no ordering
> > > > > requirements
> > > > > >    this is not a problem.
> > > > > >
> > > > > >
> > > > > >    - *CON*: Manually committing the position becomes harder as it
> > > > > >    requires that all threads co-ordinate to ensure that
> processing
> > is
> > > > > complete
> > > > > >    for that partition.
> > > > > >
> > > > > > There are many possible variations on this approach. For example
> > each
> > > > > > processor thread can have its own queue, and the consumer threads
> > can
> > > > > hash
> > > > > > into these queues using the TopicPartition to ensure in-order
> > > > consumption
> > > > > > and simplify commit.
> > > > >
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Liam Clarke-Hutchinson
> > > > >
> > > > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <
> pdeole2...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is there any configuration in kafka consumer to specify multiple
> > > > threads
> > > > > > the way it is there in kafka streams?
> > > > > > Essentially, can we have a consumer with multiple threads where
> the
> > > > > threads
> > > > > > would divide partitions of topic among them?
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > ========================
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > ========================
> > >
> >
>
>
> --
> ========================
> Okada Haruki
> ocadar...@gmail.com
> ========================
>

Re: multi-threaded consumer configuration like stream threads?

Reply via email to