Trying to understand 0.9.0 producer and Consumer design

2015-12-01 Thread SpikyHawk SpikyHawk
Hi Everybody is there any design document you can point me to understand producer and consumer in Kafka 0.9.0? I am reading https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite but would like to know if this is "up to date" and reflect actual implementation. Regards Luciano

Re: 0.9.0.0 release notes is opening download mirrors page

2015-12-01 Thread Kris K
Thanks Jun. Missed that part, my bad. Regards, Kris On Mon, Nov 30, 2015 at 4:17 PM, Jun Rao wrote: > Kris, > > It just points to the mirror site. If you click on one of the links, you > will see the release notes. > > Thanks, > > Jun > > On Mon, Nov 30, 2015 at 1:37 PM,

Re: Consumer group disappears and consumers loops

2015-12-01 Thread Jason Gustafson
I've been unable to reproduce this issue running locally. Even with a poll timeout of 1 millisecond, it seems to work as expected. It would be helpful to know a little more about your setup. Are you using SSL? Are the brokers remote? Is the network stable? Thanks, Jason On Tue, Dec 1, 2015 at

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Jason Gustafson
Hey Tao, other than high latency between the brokers and the consumer, I'm not sure what would cause this. Can you turn on debug logging and run again? I'm looking for any connection problems or metadata/fetch request errors. And I have to ask a dumb question, how do you know that more messages

Any gotchas upgrading to 0.9?

2015-12-01 Thread Rajiv Kurian
I plan to upgrade both the server and clients to 0.9. Had a few questions before I went ahead with the upgrade: 1. Do all brokers need to be on 0.9? Currently we are running 0.8.2. We'd ideally like to convert only a few brokers to 0.9 and only if we don't see problems convert the rest. 2. Is it

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Rajiv Kurian
I saw the upgrade path documentation at http://kafka.apache.org/documentation.html and that kind of answers (1). Not sure if there is anything about client compatibility though. On Tue, Dec 1, 2015 at 8:51 AM, Rajiv Kurian wrote: > I plan to upgrade both the server and

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Rajiv Kurian
Also I remember reading (can't find now) something about default traffic quotas. I'd hope the default quotas are very large (infinite?) and not small so that compatibility is maintained. It would be very unfortunate if some of our traffic was throttled because of the upgrade because of magic

Re: Number of partitions and disks in a topic

2015-12-01 Thread Todd Palino
Getting the partitioning right now is only important if your messages are keyed. If they’re not, stop reading, start with a fairly low number of partitions, and expand as needed. 1000 partitions per topic is generally not normal. It’s not really a problem, but you want to size topics

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
Thanks Tao, it worked. I also played around with my test setting trying to replicate your results, using default settings. But als long as the poll timeout is set to 100ms or larger the only time-out I get are near the start and near the end (when indeed there is nothing to consume). This is with

Re: Consumer group disappears and consumers loops

2015-12-01 Thread Jason Gustafson
Hi Martin, I'm also not sure why the poll timeout would affect this. Perhaps the handler is still doing work (e.g. sending requests) when the record set is empty? As a general rule, I would recommend longer poll timeouts. I've actually tended to use Long.MAX_VALUE myself. I'll have a look just

RE: Any gotchas upgrading to 0.9?

2015-12-01 Thread Todd Snyder
The quota page is here: http://kafka.apache.org/documentation.html#design_quotas "By default, each unique client-id receives a fixed quota in bytes/sec as configured by the cluster (quota.producer.default, quota.consumer.default)" I also noticed there's been a change in the replication

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
Another possible reason witch comes to me mind is that you have multiple consumer threads, but not the partitions/brokers to support them. When I'm running my tool on multiple threads I get a lot of time-outs. When I only use one consumer thread I get them only at the start and the end. On Wed,

Re: flush() vs close()

2015-12-01 Thread Ewen Cheslack-Postava
Kashif, The difference is that close() will also shut down the producer such that it can no longer send any messages. flush(), in contrast, is useful if you want to make sure that all the messages enqueued so far have been sent and acked, but also want to send more messages after that. -Ewen On

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread tao xiao
Hi Jason, You are correct. I initially produced 1 messages in Kafka before I started up my consumer with auto.offset.reset=earliest. But like I said the majority number of first 10 polls returned 0 message and the lag remained above 0 which means I still have enough messages to consume. BTW

ZookeeperConsumerConnector error

2015-12-01 Thread Datta, Saurav
Hello, I am getting the below error message: kafka.common.MessageStreamsExistException: ZookeeperConsumerConnector can create message streams at most once at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:79) at

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Jason Gustafson
There is some initial overhead before data can be fetched. For example, the group has to be joined and topic metadata has to be fetched. Do you see unexpected empty fetches beyond the first 10 polls? Thanks, Jason On Tue, Dec 1, 2015 at 7:43 PM, tao xiao wrote: > Hi

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Jay Kreps
I think the point is that we should ideally try to cover all these in the "upgrade" notes. -Jay On Tue, Dec 1, 2015 at 10:37 AM, Aditya Auradkar wrote: > Rajiv, > > By default, the quota is unlimited until you decide to configure them > explicitly. > And yes, we did get

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Rajiv Kurian
Thanks folks. Anywhere I can read about client compatibility i.e. old clients working with new servers and new clients working with old servers? Thanks, Rajiv On Tue, Dec 1, 2015 at 10:54 AM, Jay Kreps wrote: > I think the point is that we should ideally try to cover all

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Aditya Auradkar
Rajiv, By default, the quota is unlimited until you decide to configure them explicitly. And yes, we did get rid of "replica.lag.max.messages", so that configuration will no longer apply. Aditya On Tue, Dec 1, 2015 at 10:24 AM, Todd Snyder wrote: > The quota page is

Re: Any gotchas upgrading to 0.9?

2015-12-01 Thread Guozhang Wang
Old clients should work well with the new servers, while vice versa is not generally true mainly because the new consumer client uses a few new request types that only the new brokers can recognize. So the normal upgrade path is server-first, clients later. Filed

Re: Spark streaming job hangs

2015-12-01 Thread Paul Leclercq
You might not have enough cores to process data from Kafka > When running a Spark Streaming program locally, do not use “local” or > “local[1]” as the master URL. Either of these means that only one thread > will be used for running tasks locally. If you are using a input DStream > based on a

Re: Spark streaming job hangs

2015-12-01 Thread Archit Thakur
Which version of spark you are runinng? Have you created Kafka-Directstream ? I am asking coz you might / might not be using receivers. Also, When you say hangs, you mean there is no other log after this and process still up? Or do you mean, it kept on adding the jobs but did nothing else. (I am

New consumer not fetching as quickly as possible

2015-12-01 Thread tao xiao
Hi team, I am using the new consumer with broker version 0.9.0. I notice that poll(time) occasionally returns 0 message even though I have enough messages in broker. The rate of returning 0 message is quite high like 4 out of 5 polls return 0 message. It doesn't help by increasing the poll

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread Gerard Klijs
I was experimenting with the timeout setting, but as long as messages are produced and the consumer(s) keep polling I saw little difference. I did see for example that when producing only 1 message a second, still it sometimes wait to get three messages. So I also would like to know if there is a

Re: What is the benefit of using acks=all and minover e.g. acks=3

2015-12-01 Thread Andreas Flinck
Hi We have run the tests with your proposed properties, but with the same result. However, we noticed that kafka broker only seems to run on 1 out of 72 cores with 600% cpu usage. It is obviously overloading one core without scaling threading. The test environment is running RedHat 6.7 and

Re: Consumer group disappears and consumers loops

2015-12-01 Thread Martin Skøtt
Hi Jason, That actually sounds like a very plausible explanation. My current consumer is using the default settings, but I have previously used these (taken from the sample in the Javadoc for the new KafkaConsumer): "auto.commit.interval.ms", "1000" "session.timeout.ms", "3" My consumer

Re: New consumer not fetching as quickly as possible

2015-12-01 Thread tao xiao
Gerard, In your case I think you can set fetch.min.bytes=1 so that the server will answer the fetch request as soon as a single byte of data is available instead of accumulating enough messages. But in my case is I have plenty of messages in broker and I am sure the size of total message are

Number of partitions and disks in a topic

2015-12-01 Thread Guillermo Ortiz
Hello, I want to size the kafka cluster with just one topic and I'm going to process the data with Spark and others applications. If I have six hard drives per node, is it kafka smart enough to deal with them? I guess that the memory should be very important in this point and all data is cached