Re: multi-threaded consumer configuration like stream threads?

Pushkar Deole Mon, 23 Nov 2020 09:37:43 -0800

Thanks Haruki... right now the max of such types of events that we would
have is 100 since we would be supporting those many customers (accounts)
for now, for which we are considering a simple approach of a single
consumer and a thread pool with around 10 threads. So the question was
regarding how to manage failed events, should those be retried until
successful or sent to a dead letter queue/topic from where they will be
processed again until successful.



On Mon, Nov 23, 2020 at 10:16 PM Haruki Okada <ocadar...@gmail.com> wrote:

> Hi Pushkar.
>
> Just for your information, https://github.com/line/decaton is a Kafka
> consumer framework that supports parallel processing per single partition.
>
> It manages committable (i.e. the offset that all preceding offsets have
> been processed) offset internally so that preserves at-least-once semantics
> even when processing in parallel.
>
>
> 2020年11月24日(火) 1:16 Pushkar Deole <pdeole2...@gmail.com>:
>
> > Thanks Liam!
> > We don't have a requirement to maintain order of processing for events
> even
> > within a partition. Essentially, these are events for various accounts
> > (customers) that we want to support and do necessary database
> provisioning
> > for those in our database. So they can be processed in parallel.
> >
> > I think the 2nd option would suit our requirement to have a single
> consumer
> > and a bound thread pool for processors. However, the issue we may face is
> > to commit the offsets only after processing an event since we don't want
> > the consumer to auto commit offsets before the provisioning done for the
> > customer. How can that be achieved with model #2  ?
> >
> > On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson <
> > liam.cla...@adscale.co.nz> wrote:
> >
> > > Hi Pushkar,
> > >
> > > No. You'd need to combine a consumer with a thread pool or similar as
> you
> > > prefer. As the docs say (from
> > >
> > >
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> > > )
> > >
> > > We have intentionally avoided implementing a particular threading model
> > for
> > > > processing. This leaves several options for implementing
> multi-threaded
> > > > processing of records.
> > > > 1. One Consumer Per Thread
> > > > A simple option is to give each thread its own consumer instance.
> Here
> > > are
> > > > the pros and cons of this approach:
> > > >
> > > >    - *PRO*: It is the easiest to implement
> > > >
> > > >
> > > >    - *PRO*: It is often the fastest as no inter-thread co-ordination
> is
> > > >    needed
> > > >
> > > >
> > > >    - *PRO*: It makes in-order processing on a per-partition basis
> very
> > > >    easy to implement (each thread just processes messages in the
> order
> > it
> > > >    receives them).
> > > >
> > > >
> > > >    - *CON*: More consumers means more TCP connections to the cluster
> > (one
> > > >    per thread). In general Kafka handles connections very efficiently
> > so
> > > this
> > > >    is generally a small cost.
> > > >
> > > >
> > > >    - *CON*: Multiple consumers means more requests being sent to the
> > > >    server and slightly less batching of data which can cause some
> drop
> > > in I/O
> > > >    throughput.
> > > >
> > > >
> > > >    - *CON*: The number of total threads across all processes will be
> > > >    limited by the total number of partitions.
> > > >
> > > > 2. Decouple Consumption and Processing
> > > > Another alternative is to have one or more consumer threads that do
> all
> > > > data consumption and hands off ConsumerRecords
> > > > <
> > >
> >
> https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
> > >
> > > instances
> > > > to a blocking queue consumed by a pool of processor threads that
> > actually
> > > > handle the record processing. This option likewise has pros and cons:
> > > >
> > > >    - *PRO*: This option allows independently scaling the number of
> > > >    consumers and processors. This makes it possible to have a single
> > > consumer
> > > >    that feeds many processor threads, avoiding any limitation on
> > > partitions.
> > > >
> > > >
> > > >    - *CON*: Guaranteeing order across the processors requires
> > particular
> > > >    care as the threads will execute independently an earlier chunk of
> > > data may
> > > >    actually be processed after a later chunk of data just due to the
> > > luck of
> > > >    thread execution timing. For processing that has no ordering
> > > requirements
> > > >    this is not a problem.
> > > >
> > > >
> > > >    - *CON*: Manually committing the position becomes harder as it
> > > >    requires that all threads co-ordinate to ensure that processing is
> > > complete
> > > >    for that partition.
> > > >
> > > > There are many possible variations on this approach. For example each
> > > > processor thread can have its own queue, and the consumer threads can
> > > hash
> > > > into these queues using the TopicPartition to ensure in-order
> > consumption
> > > > and simplify commit.
> > >
> > >
> > > Cheers,
> > >
> > > Liam Clarke-Hutchinson
> > >
> > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pdeole2...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there any configuration in kafka consumer to specify multiple
> > threads
> > > > the way it is there in kafka streams?
> > > > Essentially, can we have a consumer with multiple threads where the
> > > threads
> > > > would divide partitions of topic among them?
> > > >
> > >
> >
>
>
> --
> ========================
> Okada Haruki
> ocadar...@gmail.com
> ========================
>

Re: multi-threaded consumer configuration like stream threads?

Reply via email to