Re: Uneven distribution of messages in topic's partitions

2020-06-19 Thread Nag Y
Hi Ricardo , Just follow up question to add , I believe the defaultpartioner uses mumur3 as default . Should RoundRobinPartitioner class be used to have an equal distribution to maximum extent.instead of default partitioner ? Is StickyPartitioner (mentioned above) is different from Roun

Re: Uneven distribution of messages in topic's partitions

2020-06-19 Thread Ricardo Ferreira
Hi Hemant, Being able to lookup specific records by key is not possible in Kafka. As a distributed streaming platform based on the concept of a commit log Kafka organizes data sequentially where each record has an offset that uniquely identifies not who the record is but where within the log i

Re: Duplicate records on consumer side.

2020-06-19 Thread Ricardo Ferreira
Hi Sunil, No worries for the mix up in the email. I totally understand! "So Now if I already started 3 instances on 3 servers with 3 threads each, then To better utilize it, i have to increase partitions. Right?" -- Yes, you got that right. To ease up your understanding... always think in ter

Re: Uneven distribution of messages in topic's partitions

2020-06-19 Thread Hemant Bairwa
Thanks Ricardo. I need some information on more use case. In my application I need to use Kafka to maintain the different workflow states of message items while processing through different processes. For example in my application all messages transits from Process A to Process Z and I need to mai

Re: Broker thread pool sizing

2020-06-19 Thread sunil chaudhari
This is awesome. Thanks Ricardo. On Fri, 19 Jun 2020 at 9:31 PM, Ricardo Ferreira wrote: > Gérald, > > Typically you should set the `num.io.threads` to something greater than > the # of disks since data hits the page cache and the disk. Using the > default of 8 when you have a JBOD of 12 attache

Re: Duplicate records on consumer side.

2020-06-19 Thread sunil chaudhari
Hi, Thanks for the clarification. This means, for “A“ consumer group, Running one Consumer instance with 3 threads on one server is equal to running 3 different instances with one thread each on 3 different servers. So Now if I already started 3 instances on 3 servers with 3 threads each, then To

Re: Broker thread pool sizing

2020-06-19 Thread Gérald Quintana
Thank you very much Ricardo, It's crystal clear. Gérald Le ven. 19 juin 2020 à 18:01, Ricardo Ferreira a écrit : > Gérald, > > Typically you should set the `num.io.threads` to something greater than > the # of disks since data hits the page cache and the disk. Using the > default of 8 when you

Re: Broker thread pool sizing

2020-06-19 Thread Ricardo Ferreira
Gérald, Typically you should set the `num.io.threads` to something greater than the # of disks since data hits the page cache and the disk. Using the default of 8 when you have a JBOD of 12 attached volumes would cause an increase of CPU context switching, for example. `num.network.threads`

Only single node works in 3 node cluster

2020-06-19 Thread linux.il
I'm using 3 node cluster into AWS MSK. There are two issues, probably related: cluster is very slow, and only one broker is actually processing messages. MSK doesn't provides applevel metrics, just "network packages by broker" per second, which is 350/1/1 per node. Standalone Kafka installation on

Re: Duplicate records on consumer side.

2020-06-19 Thread Ricardo Ferreira
Sunil, Kafka ensures that each partition is read by one given thread only from a consumer group. Since your topic has three partitions, the rationale is that at least three threads from the consumer group will be properly served. However, though your calculation is correct (3 instances, each

Re: Frequent consumer offset commit failures

2020-06-19 Thread Ricardo Ferreira
James, If I were you I would start investigating what is causing this network drops between your cluster and your consumers. The following messages are some indications of this: * "Offset commit failed on partition MyTopic-53 at offset 957: The request *timed out*." * "Caused by: org.apach

Broker thread pool sizing

2020-06-19 Thread Gérald Quintana
Hello, How do you size Kafka broker thread pools, in particular num.io.threads (8 by default) and num.network.threads (3 by default) depending on the number of CPU cores available on the host? Regards, Gérald

Duplicate records on consumer side.

2020-06-19 Thread sunil chaudhari
Hi, I am using kafka as a broker in my event data pipeline. Filebeat as producer Logstash as consumer. Filebeat simply pushes to Kafka. Logstash has 3 instances. Each instance has a consumer group say consumer_mytopic which reads from mytopic. mytopic has 3 partitions and 2 replica. As per my u

consumer crashes due to exception in Consumer.poll method

2020-06-19 Thread Pushkar Deole
Hi All, I don't know how to fix this and whether just catching exception from Consumer.poll method will help here: there is some issue with schema registry due to which the consumer application received error while deserializing the event. Now, even when schema registry was restarted the consumer