Thanks Liam! We don't have a requirement to maintain order of processing for events even within a partition. Essentially, these are events for various accounts (customers) that we want to support and do necessary database provisioning for those in our database. So they can be processed in parallel.
I think the 2nd option would suit our requirement to have a single consumer and a bound thread pool for processors. However, the issue we may face is to commit the offsets only after processing an event since we don't want the consumer to auto commit offsets before the provisioning done for the customer. How can that be achieved with model #2 ? On Tue, Oct 27, 2020 at 2:50 PM Liam Clarke-Hutchinson < liam.cla...@adscale.co.nz> wrote: > Hi Pushkar, > > No. You'd need to combine a consumer with a thread pool or similar as you > prefer. As the docs say (from > > https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html > ) > > We have intentionally avoided implementing a particular threading model for > > processing. This leaves several options for implementing multi-threaded > > processing of records. > > 1. One Consumer Per Thread > > A simple option is to give each thread its own consumer instance. Here > are > > the pros and cons of this approach: > > > > - *PRO*: It is the easiest to implement > > > > > > - *PRO*: It is often the fastest as no inter-thread co-ordination is > > needed > > > > > > - *PRO*: It makes in-order processing on a per-partition basis very > > easy to implement (each thread just processes messages in the order it > > receives them). > > > > > > - *CON*: More consumers means more TCP connections to the cluster (one > > per thread). In general Kafka handles connections very efficiently so > this > > is generally a small cost. > > > > > > - *CON*: Multiple consumers means more requests being sent to the > > server and slightly less batching of data which can cause some drop > in I/O > > throughput. > > > > > > - *CON*: The number of total threads across all processes will be > > limited by the total number of partitions. > > > > 2. Decouple Consumption and Processing > > Another alternative is to have one or more consumer threads that do all > > data consumption and hands off ConsumerRecords > > < > https://kafka.apache.org/26/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html> > instances > > to a blocking queue consumed by a pool of processor threads that actually > > handle the record processing. This option likewise has pros and cons: > > > > - *PRO*: This option allows independently scaling the number of > > consumers and processors. This makes it possible to have a single > consumer > > that feeds many processor threads, avoiding any limitation on > partitions. > > > > > > - *CON*: Guaranteeing order across the processors requires particular > > care as the threads will execute independently an earlier chunk of > data may > > actually be processed after a later chunk of data just due to the > luck of > > thread execution timing. For processing that has no ordering > requirements > > this is not a problem. > > > > > > - *CON*: Manually committing the position becomes harder as it > > requires that all threads co-ordinate to ensure that processing is > complete > > for that partition. > > > > There are many possible variations on this approach. For example each > > processor thread can have its own queue, and the consumer threads can > hash > > into these queues using the TopicPartition to ensure in-order consumption > > and simplify commit. > > > Cheers, > > Liam Clarke-Hutchinson > > On Tue, Oct 27, 2020 at 8:04 PM Pushkar Deole <pdeole2...@gmail.com> > wrote: > > > Hi, > > > > Is there any configuration in kafka consumer to specify multiple threads > > the way it is there in kafka streams? > > Essentially, can we have a consumer with multiple threads where the > threads > > would divide partitions of topic among them? > > >