Re: Consume data-skewed partitions using Kafka-streams causes consumer load balancing issue.

2022-07-13 Thread Guozhang Wang
Hello Ankit, Kafka Streams's rebalance protocol is trying to balance workloads based on the num.partitions (more specifically, the num.tasks which is derived from the input partitions) but not on the num.messages or num.bytes, so they would not be able to handle data-skewness across partitions

Consume data-skewed partitions using Kafka-streams causes consumer load balancing issue.

2022-07-07 Thread ankit Soni
Hello kafka-users, I have 50 topics, each with 32 partitions where data is being ingested continuously. Data is being published in these 50 partitions externally (no control) which causes data skew amount the partitions of each topic. For example: For topic-1, partition-1 contains 100 events,

Re: Kafka stream application load balancing

2018-08-05 Thread Siva Ram
Hi Guozhang, Thanks for the suggestions below. I consider we got past the REBALANCING issue. However, we are running into significant memory usage issue. I will open a separate thread for this. 1) During the punctuate we require to perform certain tasks and it was exceeding the consumer

Re: Kafka stream application load balancing

2018-07-30 Thread Guozhang Wang
Hello Siva, To better understand your situation, I'd need to ask a few more questions: 1) What triggers your REBALANCING event? 2) Does your application contain any states? If yes, how are they configured (persistent or in-memory, is logging enabled, etc)? 3) What is your commit interval

Kafka stream application load balancing

2018-07-29 Thread Siva Ram
Hi, Kafka version 1.0.0 (can't upgrade to another version yet due to legacy dependency) The stream application uses low level processor API and maintains state. A topic is setup with 30 partitions and I had split to 2 stream application instances consuming the same topic, each with 15 threads.

Re: Question about kafka-streams task load balancing

2017-03-24 Thread Guozhang Wang
ough. > > Right now, you should reduce the number of threads. Load balancing is > based on threads, and thus, if Streams place tasks to all threads of one > machine, it will automatically assign the remaining tasks to thread of > the second machine. > > Btw: If you have only 9 input

Re: Question about kafka-streams task load balancing

2017-03-21 Thread Matthias J. Sax
Hi, I guess, it's currently not possible to load balance between different machines. It might be a nice optimization to add into Streams though. Right now, you should reduce the number of threads. Load balancing is based on threads, and thus, if Streams place tasks to all threads of one machine

Question about kafka-streams task load balancing

2017-03-21 Thread Prasad, Karthik
Hey, I have a typical scenario of a kafka-streams application in a production environment. We have a kafka-cluster with multiple topics. Messages from one topic is being consumed by a the kafka-streams application. The topic, currently, has 9 partitions. We have configured consumer thread

Re: Load Balancing Kafka

2015-07-16 Thread Dana Powers
set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each broker. You might want to do leader migration and/or partition

Re: Load Balancing Kafka

2015-07-15 Thread Terry Bates
for a particular partition will eventually be in-sync with the leader for a particular partition. So, I don't think you need to worry about sending your messages to VIP and having to direct where messages end up with manual load-balancing, even if your messages are assigned to a partition randomly

Re: Load Balancing Kafka

2015-07-15 Thread Jiangjie Qin
If you have pretty balanced traffic on each partition and have set auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each

Re: Load Balancing Kafka

2015-07-15 Thread Jiangjie Qin
auto.leader.rebalance.enabled to true or false, you might not need to do further workload balance. However, in most cases you probably still need to do some sort of load balancing based on the traffic and disk utilization of each broker. You might want to do leader migration and/or partition reassignment. Leader

Load Balancing Kafka

2015-07-15 Thread Sandy Waters
Hi all, Do I need to load balance against the brokers? I am using the python driver and it seems to only want a single kafka broker host. However, in a situation where I have 10 brokers, is it still fine to just give it one host. Does zookeeper and kafka handle the load balancing and redirect

load balancing

2015-03-02 Thread sunil kalva
Is kafka load balancing based on number of partitions of a topic or number partitions of all topics in a cluster ? -- SunilKalva

Re: load balancing

2015-03-02 Thread Jiangjie Qin
There are two algorithms: range and round robin. Range algorithm does balance for each topic independently. Round robin balance across all the topics the consumer is consuming from. Jiangjie (Becket) Qin On 3/2/15, 2:05 AM, sunil kalva sambarc...@gmail.com wrote: Is kafka load balancing based

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-29 Thread Jun Rao
What's the output of the ConsumerOffsetChecker tool? Thanks, Jun On Tue, Oct 28, 2014 at 7:31 AM, Natarajan, Murugavel murugavel.natara...@softwareag.com wrote: Hi, I have the following Kafka Setup Number of producer : 1 Number of topics : 1 Number of partitions : 2 Number of consumers

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-28 Thread Natarajan, Murugavel
Hi, I have the following Kafka Setup Number of producer : 1 Number of topics : 1 Number of partitions : 2 Number of consumers : 3 (with same group id) Number of Kafka cluster : none(single Kafka server) Zookeeper.session.timeout : 1000 Producer produces messages without any specific partitioning

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-09 Thread Neha Narkhede
With SimpleConsumer, you will have to handle leader discovery as well as zookeeper based rebalancing. You can see an example here - https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example On Wed, Oct 8, 2014 at 11:45 AM, Sharninder sharnin...@gmail.com wrote: Thanks Gwen.

Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Hi, I'm not even sure if this is a valid use-case, but I really wanted to run it by you guys. How do I load balance my consumers? For example, if my consumer machine is under load, I'd like to spin up another VM with another consumer process to keep reading messages off any topic. On similar

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen. When you're saying that I can add consumers to the same group, does that also hold true if those consumers are running on different machines? Or in different JVMs? -- Sharninder On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com wrote: If you use the high level

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
If you use the high level consumer implementation, and register all consumers as part of the same group - they will load-balance automatically. When you add a consumer to the group, if there are enough partitions in the topic, some of the partitions will be assigned to the new consumer. When a

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
yep. exactly. On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com wrote: Thanks Gwen. When you're saying that I can add consumers to the same group, does that also hold true if those consumers are running on different machines? Or in different JVMs? -- Sharninder On Wed,

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Gwen Shapira
Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1) and 1 consumer group (flume), each of the 3 topic partitions is being read by a different machine running the flume consumer: Group Topic Pid Offset logSize Lag Owner flume

Re: Load Balancing Consumers or Multiple consumers reading off same topic

2014-10-08 Thread Sharninder
Thanks Gwen. This really helped. Yes, Kafka is the best thing ever :) Now how would this be done with the Simple consumer? I'm guessing I'll have to maintain my own state in Zookeeper or something of that sort? On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira gshap...@cloudera.com wrote: Here's

topics load balancing within a consumer group

2014-06-03 Thread Weide Zhang
Hi, I have a question regarding load balancing within a consumer group. Say I have a consumer group of 4 consumers which subscribe to 4 topics , each of which have one partition. Will there be rebalancing happening on topic level ? Or I will expect consumer 1 have all the data ? Weide

Re: topics load balancing within a consumer group

2014-06-03 Thread Jun Rao
Currently, we distribute partitions to consumers on a per topic basis. So, in your cases, consumer 1 will get all the data. Thanks, Jun On Tue, Jun 3, 2014 at 3:11 PM, Weide Zhang weo...@gmail.com wrote: Hi, I have a question regarding load balancing within a consumer group. Say I have

Producer load balancing

2014-03-17 Thread Abhinav Anand
Hi, I am using the kafka producer 0.8. Each producer seem to be sending messages only to a specific broker until metadata refresh. Also I find each producer thread connected to only one broker at once. I had read that producer send messages in round robin fashion. Is there some specific

Re: Producer load balancing

2014-03-17 Thread Neha Narkhede
The behavior that you described is explained here - https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified ? Thanks, Neha On Mon, Mar 17, 2014 at 6:26 PM, Abhinav Anand ab.rv...@gmail.com wrote: Hi, I am using

High-level consumer load-balancing problem

2013-11-14 Thread hsy...@gmail.com
Hi, I have questions about the load balancing of kafka high-level consumer Suppose I have 4 partition And the producer throughput to these 4 partitions are like this 01 23 10MB/s 10MB/s 1MB/s1MB/s 1kMsg/s

Re: High-level consumer load-balancing problem

2013-11-14 Thread Jun Rao
The consumer load balancing logic today is pretty simple. It just tries to divide the partitions evenly among the consumers. It doesn't try to balance by load. Thanks, Jun On Thu, Nov 14, 2013 at 3:34 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi, I have questions about the load

Questions about failure recovery, load balancing and producer ack/async mode

2013-10-23 Thread Shafaq
Hi, Following need more elaboration after reading kafka docs: 1- In a scenario during leader fails over, what happens to messages that are not committed to other followers and to the messages that producer keep in sending (in async mode) till new leader is elected. Can the producer buffer

Re: Questions about failure recovery, load balancing and producer ack/async mode

2013-10-23 Thread Jun Rao
1,3 Take a look at request.required.acks in http://kafka.apache.org/documentation.html#producerconfigs 2. The producer does random distribution by default. However, you can provide a partitioning key and a partitioning function. For details on how consumer load balancing works, see http

load-balancing consumers question

2013-08-22 Thread Michal Haris
Hi, is it possible or has anybody tried/needed to balance partitions between consumers unevenly or based on some custom function ? Ideally with Kafka 0.7 Michal Haris

Re: load-balancing consumers question

2013-08-22 Thread Guozhang Wang
Hello Michal, This FAQ entry may help you understanding the rebalance logic: https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebabalance%3F In a word, since we use a deterministic range partition in rebalance, unevenly or customized partition

Does C++ client support zookeeper based producer load balancing?

2013-08-07 Thread Jan Rudert
Hi, I am starting with kafka. We use version 0.7.2 currently. Does anyone know wether automatic producer load balancing based on zookeeper is supported by the c++ client? Thank you! -- Jan

Does C++ client support zookeeper based producer load balancing?

2013-08-07 Thread Jan Rudert
Hi, I am starting with kafka. We use version 0.7.2 currently. Does anyone know wether automatic producer load balancing based on zookeeper is supported by the c++ client? Thank you! -- Jan

Re: Is Zookeeper already providing load balancing for Kafka brokers?

2013-07-11 Thread Jun Rao
Yes. Thanks, Jun On Wed, Jul 10, 2013 at 10:55 PM, Ryan Chan ryanchan...@gmail.com wrote: We are already using zk.connect to connect zookeeper and registered multiple brokers (same topic/partitions), so when a consumer request ZK, is load balancing already done? Thanks

Re: Is Zookeeper already providing load balancing for Kafka brokers?

2013-07-11 Thread Ryan Chan
...@gmail.com wrote: We are already using zk.connect to connect zookeeper and registered multiple brokers (same topic/partitions), so when a consumer request ZK, is load balancing already done? Thanks

Re: Is Zookeeper already providing load balancing for Kafka brokers?

2013-07-11 Thread Jun Rao
: We are already using zk.connect to connect zookeeper and registered multiple brokers (same topic/partitions), so when a consumer request ZK, is load balancing already done? Thanks

Re: About kafka 0.8 producer zookeeper-based load balancing on per-request basis

2013-01-15 Thread Chris Riccomini
key word. That is to say ,a certain must be sent to a fixed partition on a fixed broker. How the so called load balancing works? Best Regards

Re: About kafka 0.8 producer zookeeper-based load balancing on per-request basis

2013-01-15 Thread Neha Narkhede
When the producer tries to send to the old broker (which is either dead, or a slave now), the broker will either not respond, or the response will contain an error code. In this case, the broker sends a response with an error code to the producer, and then the producer retries the metadata