Hello Ankit,
Kafka Streams's rebalance protocol is trying to balance workloads based on
the num.partitions (more specifically, the num.tasks which is derived from
the input partitions) but not on the num.messages or num.bytes, so they
would not be able to handle data-skewness across partitions
Hello kafka-users,
I have 50 topics, each with 32 partitions where data is being ingested
continuously.
Data is being published in these 50 partitions externally (no control)
which causes data skew amount the partitions of each topic.
For example: For topic-1, partition-1 contains 100 events,
Hi Guozhang,
Thanks for the suggestions below. I consider we got past the REBALANCING
issue. However, we are running into significant memory usage issue. I
will open a separate thread for this.
1) During the punctuate we require to perform certain tasks and it was
exceeding the consumer
Hello Siva,
To better understand your situation, I'd need to ask a few more questions:
1) What triggers your REBALANCING event?
2) Does your application contain any states? If yes, how are they
configured (persistent or in-memory, is logging enabled, etc)?
3) What is your commit interval
Hi,
Kafka version 1.0.0 (can't upgrade to another version yet due to legacy
dependency)
The stream application uses low level processor API and maintains state. A
topic is setup with 30 partitions and I had split to 2 stream application
instances consuming the same topic, each with 15 threads.
ough.
>
> Right now, you should reduce the number of threads. Load balancing is
> based on threads, and thus, if Streams place tasks to all threads of one
> machine, it will automatically assign the remaining tasks to thread of
> the second machine.
>
> Btw: If you have only 9 input
Hi,
I guess, it's currently not possible to load balance between different
machines. It might be a nice optimization to add into Streams though.
Right now, you should reduce the number of threads. Load balancing is
based on threads, and thus, if Streams place tasks to all threads of one
machine
Hey,
I have a typical scenario of a kafka-streams application in a production
environment.
We have a kafka-cluster with multiple topics. Messages from one topic is being
consumed by a the kafka-streams application. The topic, currently, has 9
partitions. We have configured consumer thread
set
auto.leader.rebalance.enabled to true or false, you might not need to do
further workload balance.
However, in most cases you probably still need to do some sort of load
balancing based on the traffic and disk utilization of each broker. You
might want to do leader migration and/or partition
for a particular
partition will eventually be in-sync with the leader for a particular
partition. So, I don't think you need to worry about sending your messages
to VIP and having to direct where messages end up with manual
load-balancing, even if your messages are assigned to a partition randomly
If you have pretty balanced traffic on each partition and have set
auto.leader.rebalance.enabled to true or false, you might not need to do
further workload balance.
However, in most cases you probably still need to do some sort of load
balancing based on the traffic and disk utilization of each
auto.leader.rebalance.enabled to true or false, you might not need to do
further workload balance.
However, in most cases you probably still need to do some sort of load
balancing based on the traffic and disk utilization of each broker. You
might want to do leader migration and/or partition reassignment.
Leader
Hi all,
Do I need to load balance against the brokers? I am using the python
driver and it seems to only want a single kafka broker host. However, in a
situation where I have 10 brokers, is it still fine to just give it one
host. Does zookeeper and kafka handle the load balancing and redirect
Is kafka load balancing based on number of partitions of a topic or number
partitions of all topics in a cluster ?
--
SunilKalva
There are two algorithms: range and round robin.
Range algorithm does balance for each topic independently.
Round robin balance across all the topics the consumer is consuming from.
Jiangjie (Becket) Qin
On 3/2/15, 2:05 AM, sunil kalva sambarc...@gmail.com wrote:
Is kafka load balancing based
What's the output of the ConsumerOffsetChecker tool?
Thanks,
Jun
On Tue, Oct 28, 2014 at 7:31 AM, Natarajan, Murugavel
murugavel.natara...@softwareag.com wrote:
Hi,
I have the following Kafka Setup
Number of producer : 1
Number of topics : 1
Number of partitions : 2
Number of consumers
Hi,
I have the following Kafka Setup
Number of producer : 1
Number of topics : 1
Number of partitions : 2
Number of consumers : 3 (with same group id)
Number of Kafka cluster : none(single Kafka server)
Zookeeper.session.timeout : 1000
Producer produces messages without any specific partitioning
With SimpleConsumer, you will have to handle leader discovery as well as
zookeeper based rebalancing. You can see an example here -
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
On Wed, Oct 8, 2014 at 11:45 AM, Sharninder sharnin...@gmail.com wrote:
Thanks Gwen.
Hi,
I'm not even sure if this is a valid use-case, but I really wanted to run
it by you guys. How do I load balance my consumers? For example, if my
consumer machine is under load, I'd like to spin up another VM with another
consumer process to keep reading messages off any topic. On similar
Thanks Gwen.
When you're saying that I can add consumers to the same group, does that
also hold true if those consumers are running on different machines? Or in
different JVMs?
--
Sharninder
On Wed, Oct 8, 2014 at 11:35 PM, Gwen Shapira gshap...@cloudera.com wrote:
If you use the high level
If you use the high level consumer implementation, and register all
consumers as part of the same group - they will load-balance
automatically.
When you add a consumer to the group, if there are enough partitions
in the topic, some of the partitions will be assigned to the new
consumer.
When a
yep. exactly.
On Wed, Oct 8, 2014 at 11:07 AM, Sharninder sharnin...@gmail.com wrote:
Thanks Gwen.
When you're saying that I can add consumers to the same group, does that
also hold true if those consumers are running on different machines? Or in
different JVMs?
--
Sharninder
On Wed,
Here's an example (from ConsumerOffsetChecker tool) of 1 topic (t1)
and 1 consumer group (flume), each of the 3 topic partitions is being
read by a different machine running the flume consumer:
Group Topic Pid Offset
logSize Lag Owner
flume
Thanks Gwen. This really helped.
Yes, Kafka is the best thing ever :)
Now how would this be done with the Simple consumer? I'm guessing I'll have
to maintain my own state in Zookeeper or something of that sort?
On Thu, Oct 9, 2014 at 12:01 AM, Gwen Shapira gshap...@cloudera.com wrote:
Here's
Hi,
I have a question regarding load balancing within a consumer group.
Say I have a consumer group of 4 consumers which subscribe to 4 topics ,
each of which have one partition. Will there be rebalancing happening on
topic level ? Or I will expect consumer 1 have all the data ?
Weide
Currently, we distribute partitions to consumers on a per topic basis. So,
in your cases, consumer 1 will get all the data.
Thanks,
Jun
On Tue, Jun 3, 2014 at 3:11 PM, Weide Zhang weo...@gmail.com wrote:
Hi,
I have a question regarding load balancing within a consumer group.
Say I have
Hi,
I am using the kafka producer 0.8. Each producer seem to be sending
messages only to a specific broker until metadata refresh. Also I find each
producer thread connected to only one broker at once.
I had read that producer send messages in round robin fashion. Is there
some specific
The behavior that you described is explained here -
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?
Thanks,
Neha
On Mon, Mar 17, 2014 at 6:26 PM, Abhinav Anand ab.rv...@gmail.com wrote:
Hi,
I am using
Hi,
I have questions about the load balancing of kafka high-level consumer
Suppose I have 4 partition
And the producer throughput to these 4 partitions are like this
01 23
10MB/s 10MB/s 1MB/s1MB/s
1kMsg/s
The consumer load balancing logic today is pretty simple. It just tries to
divide the partitions evenly among the consumers. It doesn't try to balance
by load.
Thanks,
Jun
On Thu, Nov 14, 2013 at 3:34 PM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi,
I have questions about the load
Hi,
Following need more elaboration after reading kafka docs:
1- In a scenario during leader fails over, what happens to messages that
are not committed to other followers and to the messages that producer keep
in sending (in async mode) till new leader is elected. Can the producer
buffer
1,3 Take a look at request.required.acks in
http://kafka.apache.org/documentation.html#producerconfigs
2. The producer does random distribution by default. However, you can
provide a partitioning key and a partitioning function. For details on how
consumer load balancing works, see
http
Hi, is it possible or has anybody tried/needed to balance partitions
between consumers unevenly or based on some custom function ? Ideally with
Kafka 0.7
Michal Haris
Hello Michal,
This FAQ entry may help you understanding the rebalance logic:
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-CanIpredicttheresultsoftheconsumerrebabalance%3F
In a word, since we use a deterministic range partition in rebalance,
unevenly or customized partition
Hi,
I am starting with kafka. We use version 0.7.2 currently. Does anyone know
wether automatic producer load balancing based on zookeeper is supported by
the c++ client?
Thank you!
-- Jan
Hi,
I am starting with kafka. We use version 0.7.2 currently. Does anyone know
wether automatic producer load balancing based on zookeeper is supported by
the c++ client?
Thank you!
-- Jan
Yes.
Thanks,
Jun
On Wed, Jul 10, 2013 at 10:55 PM, Ryan Chan ryanchan...@gmail.com wrote:
We are already using zk.connect to connect zookeeper and registered
multiple brokers (same topic/partitions), so when a consumer request ZK, is
load balancing already done?
Thanks
...@gmail.com wrote:
We are already using zk.connect to connect zookeeper and registered
multiple brokers (same topic/partitions), so when a consumer request ZK,
is
load balancing already done?
Thanks
:
We are already using zk.connect to connect zookeeper and registered
multiple brokers (same topic/partitions), so when a consumer request
ZK,
is
load balancing already done?
Thanks
key word. That is to say ,a certain
must be sent to a fixed partition on a fixed broker. How the so called
load balancing works?
Best Regards
When the producer
tries to send to the old broker (which is either dead, or a slave now),
the broker will either not respond, or the response will contain an error
code.
In this case, the broker sends a response with an error code to the
producer, and then the producer retries the metadata
41 matches
Mail list logo