Re: ConsumerOffsetChecker shows none partitions assigned

2014-10-16 Thread Jun Rao
Which version of ZK are you using? Also, see https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyaretheremanyrebalancesinmyconsumerlog ? Thanks, Jun On Thu, Oct 16, 2014 at 3:29 PM, Hari Gorak wrote: > Project: Kafka > > Issue Type: Bug > > Components: consumer > > Affects Versions: 0

Re: getOffsetsBefore(...) => kafka.common.UnknownException

2014-10-16 Thread Jun Rao
The OffsetRequest can only be answered by the leader of the partition. Did you connect the SimpleConsumer to the leader broker? If not, you need to use TopicMetadataRequest to find out the leader broker first. Thanks, Jun On Thu, Oct 16, 2014 at 3:56 AM, Magnus Vojbacke < magnus.vojba...@digital

Re: Monitoring connection with kafka client

2014-10-16 Thread Otis Gospodnetic
Hi, We use our own SPM to monitor our Kafka brokers, producers, and consumers (and ZK) and have alerts and anomaly detection on several key Kafka and ZK metrics. When things break around ZK and/or Kafka. we find out pretty quickly because a lot of metrics suddenly s

ConsumerOffsetChecker shows none partitions assigned

2014-10-16 Thread Hari Gorak
Project: Kafka Issue Type: Bug Components: consumer Affects Versions: 0.8.0 Environment: HP 40 x Intel(R) Xeon(R) CPU E5-2470 v2 @ 2.40GHz/1.2e+02GB bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker shows some partitions having "none" consumers after re-balance triggered due to new co

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-16 Thread Joe Stein
+1 for a 0.8.2-beta next week and 0.8.2.0 final 4-5 weeks later. I agree to the tickets you brought up to have in 0.8.2-beta and also https://issues.apache.org/jira/browse/KAFKA-1493 for lz4 compression. /*** Joe Stein Founder, Principal Consultant Big Data

Re: Consistency and Availability on Node Failures

2014-10-16 Thread cac...@gmail.com
Knowing that the partitioning is consistent for a given key means that (apart from other benefits) a given consumer only deals with a partition of the keyspace. So if you are in a system with tens of millions of users each consumer only has to store state on a small number of them with inconsistent

Re: getOffsetsBefore(...) => kafka.common.UnknownException

2014-10-16 Thread Neha Narkhede
Do you see any errors on the broker? Are you sure that the consumer's fetch offset is set higher than the largest message in your topic? It should be higher than message.max.bytes on the broker (which defaults to 1MB). On Thu, Oct 16, 2014 at 3:56 AM, Magnus Vojbacke < magnus.vojba...@digitalroute

Re: Monitoring connection with kafka client

2014-10-16 Thread Neha Narkhede
If you want to know if the Kafka and zookeeper cluster is healthy or not, you'd want to monitor the cluster directly. Here are pointers for monitoring the Kafka brokers - http://kafka.apache.org/documentation.html#monitoring Thanks, Neha On Thu, Oct 16, 2014 at 3:09 AM, Alex Objelean wrote: > H

Re: [Kafka-users] Producer not distributing across all partitions

2014-10-16 Thread Neha Narkhede
A topic.metadata.refresh.interval.ms of 10 mins means that the producer will take 10 mins to detect new partitions. So newly added or reassigned partitions might not get data for 10 mins. In general, if you're still at prototyping stages, I'd recommend using the new producer available on kafka trun

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-16 Thread Neha Narkhede
Another JIRA that will be nice to include as part of 0.8.2-beta is https://issues.apache.org/jira/browse/KAFKA-1481 that fixes the mbean naming. Looking for people's thoughts on 2 things here - 1. How do folks feel about doing a 0.8.2-beta release right now and 0.8.2 final 4-5 weeks later? 2. Do p

Re: read N items from topic

2014-10-16 Thread Neha Narkhede
Josh, The consumer's API doesn't allow you to specify N messages, but you can invoke iter.next() as Gwen suggested and count the messages. Note that the iterator can block if you have less than N messages so you will have to careful design around it. The new consumer's API provides a non blocking

Re: java api code and javadoc

2014-10-16 Thread Ewen Cheslack-Postava
KafkaStream and MessageAndOffset are Scala classes, so you'll find them under the scaladocs. The ConsumerConnector interface should show up in the javadocs with good documentation coverage. Some classes like MessageAndOffset are so simple (just compositions of other data) that they aren't going to

Re: read N items from topic

2014-10-16 Thread gshapira
Using the high level consumer, each consumer in the group can call iter.next () in a loop until they get the number of messages you need. — Sent from Mailbox On Thu, Oct 16, 2014 at 10:18 AM, Josh J wrote: > hi, > How do I read N items from a topic? I also would like to do this for a > consume

Re: Consistency and Availability on Node Failures

2014-10-16 Thread gshapira
It may be a minority, I can't tell yet. But in some apps we need to know that a consumer, who is assigned a single partition, will get all data about a subset of users. This is way more flexible than multiple topics since we still have the benefits of partition reassignment, load balancing bet

Re: java api code and javadoc

2014-10-16 Thread 4mayank
Thanks Joseph. I built the javadoc but its incomplete. Where can I find the code itself for classes like KafkaStream, MessageAndOffset, CosumerConnector etc? On Wed, Oct 15, 2014 at 11:10 AM, Joseph Lawson wrote: > You probably have to build your own right now. Check out > https://github.com/ap

read N items from topic

2014-10-16 Thread Josh J
hi, How do I read N items from a topic? I also would like to do this for a consumer group, so that each consumer can specify an N number of tuples to read, and each consumer reads distinct tuples. Thanks, Josh

Monitoring connection with kafka client

2014-10-16 Thread Alex Objelean
Hi, I'm trying to monitor the kafka connection on the consumer side. In other words, if the broker cluster is unavailable (or zookeer dies), I would like to know about that problem as soon as possible. Unfortunately, I didn't find anything useful to achieve that when using kafka library. Are there

[Kafka-users] Producer not distributing across all partitions

2014-10-16 Thread Mungeol Heo
Hi, I have a question about 'topic.metadata.refresh.interval.ms' configuration. As I know, the default value of it is 10 minutes. Does it means that producer will change the partition at every 10 minutes? What I am experiencing is producer does not change to another partition at every 10 minutes.

Re: Broker brought down and under replicated partitions

2014-10-16 Thread Neha Narkhede
Is there a known issue in the 0.8.0 version that was fixed later on? What can I do to diagnose/fix the situation? Yes, quite a few bugs related to this have been fixed since 0.8.0. I'd suggest upgrading to 0.8.1.1 On Wed, Oct 15, 2014 at 11:09 PM, Jean-Pascal Billaud wrote: > The only thing tha

Re: Kafka/Zookeeper deployment Questions

2014-10-16 Thread Neha Narkhede
In other words, if I change the number of partitions, can I restart the brokers one at a time so that I can continue processing data? Changing the # of partitions is an online operation and doesn't require restarting the brokers. However, any other configuration (with the exception of a few operat

Re: 0.8.x => 0.8.2 upgrade - live & seamless?

2014-10-16 Thread Neha Narkhede
Yes, you should be able to upgrade seamlessly. On Wed, Oct 15, 2014 at 10:07 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > Some of our SPM users who are eager to monitor their Kafka 0.8.x clusters > with SPM are asking us whether the upgrade to 0.8.2 from 0.8.1 will be > sea

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Kyle Banker
I didn't realize that anyone used partitions to logically divide a topic. When would that be preferable to simply having a separate topic? Isn't this a minority case? On Thu, Oct 16, 2014 at 7:28 AM, Gwen Shapira wrote: > Just note that this is not a universal solution. Many use-cases care > ab

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Andrew Otto
Check out Camus. It was built to do parallel loads from Kafka into time bucketed directories in HDFS. On Oct 16, 2014, at 9:32 AM, Gwen Shapira wrote: > I assume the messages themselves contain the timestamp? > > If you use Flume, you can configure a Kafka source to pull data from > Kafka,

Re: Cross-Data-Center Mirroring, and Guaranteed Minimum Time Period on Data

2014-10-16 Thread Gwen Shapira
I assume the messages themselves contain the timestamp? If you use Flume, you can configure a Kafka source to pull data from Kafka, use an interceptor to pull the date out of your message and place it in the event header and then the HDFS sink can write to a partition based on the timestamp. Gwen

Re: Consistency and Availability on Node Failures

2014-10-16 Thread Gwen Shapira
Just note that this is not a universal solution. Many use-cases care about which partition you end up writing to since partitions are used to... well, partition logical entities such as customers and users. On Wed, Oct 15, 2014 at 9:03 PM, Jun Rao wrote: > Kyle, > > What you wanted is not supp

getOffsetsBefore(...) => kafka.common.UnknownException

2014-10-16 Thread Magnus Vojbacke
Hi, I’m trying to make a request for offset information from my broker, and I get a kafka.common.UnknownException as the result. I’m trying to use the Simple Consumer API val topicAndPartition = new TopicAndPartition(“topic3”, 0) val requestInfo = new java.util.HashMap[TopicAn