Strategies for auto generating broker ID

2013-10-01 Thread Aniket Bhatnagar
I would like to revive an older thread around auto generating broker ID. As a AWS user, I would like Kafka to just use the instance's ID or instance's IP or instance's internal domain (whichever is easier). This would mean I can easily clone from a AMI to launch kafka instances without having to

Running Kafka 0.8 as supervised service

2013-10-01 Thread Aniket Bhatnagar
Has anyone been able to install and start Kafka 0.8 as a supervised service so that it comes back up after a crash/reboot/etc?

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
There are 2 types of consumer clients in Kafka - ZookeeperConsumerConnector and SimpleConsumer. Only the former has the re balancing logic. Thanks, Neha On Oct 1, 2013 6:30 AM, Kane Kane kane.ist...@gmail.com wrote: But it looks like some clients don't implement it?

Re: Strategies for auto generating broker ID

2013-10-01 Thread Aniket Bhatnagar
Right. It is currently java integer. However, as per previous thread, it seems possible to change it to a string. In that case, we can use instance IDs, IP addresses, custom ID generators, etc. How are you currently generating broker IDs from IP address? Chef script or custom shell script? On 1

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Yeah, I noticed that, i'm curious how balancing happens if SimpleConsumer is used. I.e. i can provide a partition to read from if i use SimpleConsumer, but what if someone else already attached to that partition, what would happen? Also what would happen if one SimpleConsumer attached to all

use case with high rate of duplicate messages

2013-10-01 Thread S Ahmed
I have a use case where thousands of servers send status type messages, which I am currently handling real-time w/o any kind of queueing system. So currently when I receive a message, and perform a md5 hash of the message, perform a lookup in my database to see if this is a duplicate, if not, I

Re: use case with high rate of duplicate messages

2013-10-01 Thread Guozhang Wang
Batch processing will increase the throughput but also increase latency, how large latency your real-time processing can tolerate? One thing you could try is to use the keyed messages, with key as the md5 hash of your message. Kafka has a deduplication mechanism on the brokers that dedup messages

Re: as i understand rebalance happens on client side

2013-10-01 Thread Guozhang Wang
I do not understand your question, what are you trying to implement? On Tue, Oct 1, 2013 at 8:42 AM, Kane Kane kane.ist...@gmail.com wrote: So essentially you can't do queue pattern, unless you somehow implement locking on the client? On Tue, Oct 1, 2013 at 8:35 AM, Guozhang Wang

bandwidth usage issue

2013-10-01 Thread Yu, Libo
Hi team, Here is a usage case: Assume each host in a kafka cluster a gigabit network adaptor. And the incoming traffic is 0.8gbps and at one point all the traffic goes to one host. The remaining bandwidth is not enough for the followers to replicate messages from this leader. To make sure no

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
The reason i was asking is that this library seems to have support only for SimpleConsumer https://github.com/mumrah/kafka-python/, i was curious if all should be implemented on client or kafka has some rebalancing logic and prevent consuming from the same queue on server side in case of

Re: bandwidth usage issue

2013-10-01 Thread Neha Narkhede
This is a capacity planning issue. I think the right thing to do here is to expand the cluster and use the partition reassignment tool to move some partitions over to the new brokers to evenly spread out the load. Thanks, Neha On Tue, Oct 1, 2013 at 8:53 AM, Yu, Libo libo...@citi.com wrote:

Re: as i understand rebalance happens on client side

2013-10-01 Thread Neha Narkhede
We do plan to move the group membership over to the server side and have a very thin consumer client. The proposal is here - https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPIand this is being planned for the 0.9 release. Once this is complete, the non-java

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Thanks! Direction in that proposal looks very good, I wish that would be implemented already On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede neha.narkh...@gmail.comwrote: We do plan to move the group membership over to the server side and have a very thin consumer client. The proposal is here -

Re: Iterator Question

2013-10-01 Thread Neha Narkhede
It is recommended you use the iterator() API since that invokes Kafka's ConsumerIterator which has state management logic for consuming Kafka messages properly. If you use toIterator(), it just gives you a plain Scala iterator over KafkaStream. Thanks, Neha On Tue, Oct 1, 2013 at 6:03 AM,

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Btw, is it expected to be released on Oct 31? Thanks! On Tue, Oct 1, 2013 at 9:01 AM, Neha Narkhede neha.narkh...@gmail.comwrote: We do plan to move the group membership over to the server side and have a very thin consumer client. The proposal is here -

Re: as i understand rebalance happens on client side

2013-10-01 Thread David Arthur
Kane, I'm the creator of kafka-python, just thought I'd give some insight. Consumer rebalancing is actually pretty tricky to get right. It requires interaction with ZooKeeper which (though possible via kazoo) is something I've tried to avoid in kafka-python. It also seems a little strange to

Re: as i understand rebalance happens on client side

2013-10-01 Thread Kane Kane
Thanks for reply, David, your library is great and indeed the rebalancing is currently somewhat quirky and complicated. And I guess it doesn't make sense to implement it considering 0.9 is planned relatively soon. On Tue, Oct 1, 2013 at 10:09 AM, David Arthur mum...@gmail.com wrote: Kane,

Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread rk vishu
Hello All, I am currently using 5 node kafka cluster with 0.7.2 version. Would like to get some advice on optimal number of brokers on each kafka host. Below is the specification of each machine - 4 data directories /data1,/data2, /data3, /data4 with 200+GB usable space. RAID10 - 24 Core CPU -

Re: Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread Neha Narkhede
1) Will setting 4 brokers per host with different ports and different log data directories be beneficial to use all the available space? 2) Will there be any disadvantage using multiple brokers on same host? It is recommended that you do not deploy multiple brokers on the same box since that will

Re: Recommendation for number of brokers on kafka(0.7.2) hosts

2013-10-01 Thread rk vishu
Thank you Neha for the suggestion. On Tue, Oct 1, 2013 at 1:50 PM, Neha Narkhede neha.narkh...@gmail.comwrote: 1) Will setting 4 brokers per host with different ports and different log data directories be beneficial to use all the available space? 2) Will there be any disadvantage using