I would like to revive an older thread around auto generating broker ID. As
a AWS user, I would like Kafka to just use the instance's ID or instance's
IP or instance's internal domain (whichever is easier). This would mean I
can easily clone from a AMI to launch kafka instances without having to
Has anyone been able to install and start Kafka 0.8 as a supervised service
so that it comes back up after a crash/reboot/etc?
There are 2 types of consumer clients in Kafka - ZookeeperConsumerConnector
and SimpleConsumer. Only the former has the re balancing logic.
Thanks,
Neha
On Oct 1, 2013 6:30 AM, Kane Kane kane.ist...@gmail.com wrote:
But it looks like some clients don't implement it?
Right. It is currently java integer. However, as per previous thread, it
seems possible to change it to a string. In that case, we can use instance
IDs, IP addresses, custom ID generators, etc.
How are you currently generating broker IDs from IP address? Chef script or
custom shell script?
On 1
Yeah, I noticed that, i'm curious how balancing happens if SimpleConsumer
is used. I.e. i can provide a partition to read from if i use
SimpleConsumer, but what if someone else already attached to that
partition, what would happen? Also what would happen if one SimpleConsumer
attached to all
I have a use case where thousands of servers send status type messages,
which I am currently handling real-time w/o any kind of queueing system.
So currently when I receive a message, and perform a md5 hash of the
message, perform a lookup in my database to see if this is a duplicate, if
not, I
Batch processing will increase the throughput but also increase latency,
how large latency your real-time processing can tolerate?
One thing you could try is to use the keyed messages, with key as the md5
hash of your message. Kafka has a deduplication mechanism on the brokers
that dedup messages
I do not understand your question, what are you trying to implement?
On Tue, Oct 1, 2013 at 8:42 AM, Kane Kane kane.ist...@gmail.com wrote:
So essentially you can't do queue pattern, unless you somehow implement
locking on the client?
On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang
Hi team,
Here is a usage case: Assume each host in a kafka cluster a gigabit network
adaptor.
And the incoming traffic is 0.8gbps and at one point all the traffic goes to
one host.
The remaining bandwidth is not enough for the followers to replicate messages
from
this leader.
To make sure no
The reason i was asking is that this library seems to have support only for
SimpleConsumer https://github.com/mumrah/kafka-python/, i was curious if
all should be implemented on client or kafka has some rebalancing logic and
prevent consuming from the same queue on server side in case of
This is a capacity planning issue. I think the right thing to do here is to
expand the cluster and use the partition reassignment tool to move some
partitions over to the new brokers to evenly spread out the load.
Thanks,
Neha
On Tue, Oct 1, 2013 at 8:53 AM, Yu, Libo libo...@citi.com wrote:
We do plan to move the group membership over to the server side and have a
very thin consumer client. The proposal is here -
https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPIand
this is being planned for the 0.9 release. Once this is complete, the
non-java
Thanks! Direction in that proposal looks very good, I wish that would be
implemented already
On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede neha.narkh...@gmail.comwrote:
We do plan to move the group membership over to the server side and have a
very thin consumer client. The proposal is here -
It is recommended you use the iterator() API since that invokes Kafka's
ConsumerIterator which has state management logic for consuming Kafka
messages properly. If you use toIterator(), it just gives you a plain Scala
iterator over KafkaStream.
Thanks,
Neha
On Tue, Oct 1, 2013 at 6:03 AM,
Btw, is it expected to be released on Oct 31?
Thanks!
On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede neha.narkh...@gmail.comwrote:
We do plan to move the group membership over to the server side and have a
very thin consumer client. The proposal is here -
Kane,
I'm the creator of kafka-python, just thought I'd give some insight.
Consumer rebalancing is actually pretty tricky to get right. It requires
interaction with ZooKeeper which (though possible via kazoo) is
something I've tried to avoid in kafka-python. It also seems a little
strange to
Thanks for reply, David, your library is great and indeed the rebalancing
is currently somewhat quirky and complicated. And I guess it doesn't make
sense to implement it considering 0.9 is planned relatively soon.
On Tue, Oct 1, 2013 at 10:09 AM, David Arthur mum...@gmail.com wrote:
Kane,
Hello All,
I am currently using 5 node kafka cluster with 0.7.2 version. Would like to
get some advice on optimal number of brokers on each kafka host. Below is
the specification of each machine
- 4 data directories /data1,/data2, /data3, /data4 with 200+GB usable
space. RAID10
- 24 Core CPU
-
1) Will setting 4 brokers per host with different ports and different log
data directories be beneficial to use all the available space?
2) Will there be any disadvantage using multiple brokers on same host?
It is recommended that you do not deploy multiple brokers on the same box
since that will
Thank you Neha for the suggestion.
On Tue, Oct 1, 2013 at 1:50 PM, Neha Narkhede neha.narkh...@gmail.comwrote:
1) Will setting 4 brokers per host with different ports and different log
data directories be beneficial to use all the available space?
2) Will there be any disadvantage using
20 matches
Mail list logo