Re: Intercept broker operation in Kafka

2014-06-24 Thread Daniel Compton
Hi Ravi You’ve probably seen this already but I thought I’d point it out just in case: https://kafka.apache.org/documentation.html#monitoring. In our case we are using https://github.com/pingles/kafka-riemann-reporter to send metrics to Riemann but you could get the metrics through JMX to send

Re: Cancel kafka-reassign-partitions Job

2014-06-24 Thread Lung, Paul
Hi Neha, Bug created: https://issues.apache.org/jira/browse/KAFKA-1506 Thank you, Paul Lung On 6/23/14, 9:58 PM, Neha Narkhede neha.narkh...@gmail.com wrote: Paul, Reassignment is stable in 0.8.1.1, so you may be hitting a bug. Nevertheless, could you please file a bug for canceling a

Re: Intercept broker operation in Kafka

2014-06-24 Thread ravi singh
Primarily we want to log below date(although this is not the exhaustive list): + any error/exception during kafka start/stop + any error/exception while broker is running + broker state changes like leader re-election, broker goes down, + Current live brokers + new topic creation + when messages

Kafka 0.8's VerifyConsumerRebalance reports an error

2014-06-24 Thread Yury Ruchin
Hi, I've run into the following problem. I try to read from a 50-partition Kafka topic using high level consumer with 8 streams. I'm using 8-thread pool, each thread handling one stream. After a short time, the threads reading from the stream stop reading. Lag between topic latest offset and the

How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
I’ve been reading the Kafka docs and one thing that I’m having trouble understanding is how partitions affect sequential disk IO. One of the reasons Kafka is so fast is that you can do lots of sequential IO with read-ahead cache and all of that goodness. However, if your broker is responsible

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Paul Mackles
You'll want to account for the number of disks per node. Normally, partitions are spread across multiple disks. Even more important, the OS file cache reduces the amount of seeking provided that you are reading mostly sequentially and your consumers are keeping up. On 6/24/14 3:58 AM, Daniel

Re: Consumer offset is getting reset back to some old value automatically

2014-06-24 Thread Hemath Kumar
Yes kane i have the replication factor configured as 3 On Tue, Jun 24, 2014 at 2:42 AM, Kane Kane kane.ist...@gmail.com wrote: Hello Neha, can you explain your statements: Bringing one node down in a cluster will go smoothly only if your replication factor is 1 and you enabled controlled

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
Good point. We've only got two disks per node and two topics so I was planning to have one disk/partition. Our workload is very write heavy so I'm mostly concerned about write throughput. Will we get write speed improvements by sticking to 1 partition/disk or will the difference between 1 and

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Paul Mackles
Its probably best to run some tests that simulate your usage patterns. I think a lot of it will be determined by how effectively you are able to utilize the OS file cache in which case you could have many more partitions. Its a delicate balance but you definitely want to err on the side of having

What's the behavior when Kafka is deleting messages and consumers are still reading

2014-06-24 Thread Lian, Li
When Kafka broker is trying to delete message according to the log retention setting, either triggered by log age or topic partition size, if the same time there are still consumers reading the topic, what will happen? Best regards, Lex Lian Email: ll...@ebay.com

Announcing Kafka Web Console v2.0.0

2014-06-24 Thread Claude Mamo
Announcing the second major release of Kafka Web Console: https://github.com/claudemamo/kafka-web-console/releases/tag/v2.0.0. Highlights: - I've borrowed some ideas from Kafka Offset Monitor and added graphs to show the history of consumers offsets and lag as well as message throughput - Added

Re: Announcing Kafka Web Console v2.0.0

2014-06-24 Thread Joe Stein
Awesome Claude, thanks! /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Jun

Re: high level consumer not working

2014-06-24 Thread Guozhang Wang
Hi Li Li, If you use the same consumer group id then offsets may have already been committed to Kafka, hence messages before that will not be consumed. Guozhang On Mon, Jun 23, 2014 at 6:09 PM, Li Li fancye...@gmail.com wrote: no luck by adding props.put(auto.offset.reset, smallest); but

Re: What's the behavior when Kafka is deleting messages and consumers are still reading

2014-06-24 Thread Guozhang Wang
Hi Li, The log operations are protected by a lock, so if there is a concurrent read on this partition it will not be deleted. But then when it is deleted the next fetch/read will result in a OffsetOutOfRange exception and the consumer needs to restart from a offset reset value. Guozhang On

Experiences with larger message sizes

2014-06-24 Thread Denny Lee
By any chance has anyone worked with using Kafka with message sizes that are approximately 50MB in size?  Based on from some of the previous threads there are probably some concerns on memory pressure due to the compression on the broker and decompression on the consumer and a best practices on

Apacha Kafka Commercial Support

2014-06-24 Thread Diego Alvarez Zuluaga
Hello Are there any vendors who provide commercial support for Kafka? We're very interested in using kafka, but our infraestructure team ask us (DevTeam) for a commercial support. Tks diegoalva...@sura.com.co http://www.gruposuramericana.com/en/Pages/OurPortfolio/Suramericana.aspx

Re: restarting a broker during partition reassignment

2014-06-24 Thread Luke Forehand
My hypothesis for how Partition [luke3,3] with leader 11, had offset reset to zero, caused by reboot of leader broker during partition reassignment: The replicas for [luke3,3] were in progress being reassigned from broker 10,11,12 - 11,12,13 I rebooted broker 11 which was the leader for

Re: Using GetOffsetShell against non-existent topic creates the topic which cannot be deleted

2014-06-24 Thread Luke Forehand
Definitely, this is version 0.8.1.1 https://issues.apache.org/jira/browse/KAFKA-1507 Luke Forehand | Networked Insights | Software Engineer On 6/23/14, 6:58 PM, Guozhang Wang wangg...@gmail.com wrote: Luke, Thanks for the findings, could you file a JIRA to keep track of this bug?

Re: Apacha Kafka Commercial Support

2014-06-24 Thread Joe Stein
Hi Diego, Big Data Open Source Security LLC https://www.linkedin.com/company/big-data-open-source-security-llc provides commercial support around Apache Kafka. We currently do this as a retained professional service rate (so the cost is not nodes or volume like product vendors). We have also

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Jay Kreps
The primary relevant factor here is the fsync interval. Kafka's replication guarantees do not require fsyncing every message, so the reason for doing so is to handle correlated power loss (a pretty uncommon failure in a real data center). Replication will handle most other failure modes with much

Re: Experiences with larger message sizes

2014-06-24 Thread Joe Stein
Hi Denny, have you considered saving those files to HDFS and sending the event information to Kafka? You could then pass that off to Apache Spark in a consumer and get data locality for the file saved (or something of the sort [no pun intended]). You could also stream every line (or however you

Re: Experiences with larger message sizes

2014-06-24 Thread Denny Lee
Hey Joe, Yes, I have - my original plan is to do something similar to what you suggested which was to simply push the data into HDFS / S3 and then having only the event information within Kafka so that way multiple consumers can just read the event information and ping HDFS/S3 for the actual

Re: Experiences with larger message sizes

2014-06-24 Thread Joe Stein
You could then chunk the data (wrapped in an outer message so you have meta data like file name, total size, current chunk size) and produce that with the partition key being filename. We are in progress working on a system for doing file loading to Kafka (which will eventually support both

Re: What's the behavior when Kafka is deleting messages and consumers are still reading

2014-06-24 Thread Neha Narkhede
The behavior on the consumer in this case is governed by the value of the auto.offset.reset config. Depending on this config, it will reset it's offset to either the earliest or the latest in the log. Thanks, Neha On Tue, Jun 24, 2014 at 8:17 AM, Guozhang Wang wangg...@gmail.com wrote: Hi Li,

Re: Consumer offset is getting reset back to some old value automatically

2014-06-24 Thread Neha Narkhede
Can you elaborate your notion of smooth? I thought if you have replication factor=3 in this case, you should be able to tolerate loss of a node? Yes, you should be able to tolerate the loss of a node but if controlled shutdown is not enabled, the delay between loss of the old leader and election

Re: Kafka 0.8's VerifyConsumerRebalance reports an error

2014-06-24 Thread Neha Narkhede
Is it possible that maybe the zookeeper url used for the VerifyConsumerRebalance tool is incorrect? On Tue, Jun 24, 2014 at 12:02 AM, Yury Ruchin yuri.ruc...@gmail.com wrote: Hi, I've run into the following problem. I try to read from a 50-partition Kafka topic using high level consumer

Re: Kafka 0.8's VerifyConsumerRebalance reports an error

2014-06-24 Thread Yury Ruchin
I've just double-checked. The URL is correct, the same one is used by Kafka clients. 2014-06-24 22:21 GMT+04:00 Neha Narkhede neha.narkh...@gmail.com: Is it possible that maybe the zookeeper url used for the VerifyConsumerRebalance tool is incorrect? On Tue, Jun 24, 2014 at 12:02 AM, Yury

Re: Consumer offset is getting reset back to some old value automatically

2014-06-24 Thread Kane Kane
Hello Neha, ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there is a subsequent leader election for any reason, there is a chance that the cluster does not reach a quorum. It is less likely but still risky to some extent. Does it mean if you have to tolerate 1 node loss without

Re: Consumer offset is getting reset back to some old value automatically

2014-06-24 Thread Kane Kane
Sorry, i meant 5 nodes in previous question. On Tue, Jun 24, 2014 at 12:36 PM, Kane Kane kane.ist...@gmail.com wrote: Hello Neha, ZK cluster of 3 nodes will tolerate the loss of 1 node, but if there is a subsequent leader election for any reason, there is a chance that the cluster does not

Re: How does number of partitions affect sequential disk IO

2014-06-24 Thread Daniel Compton
Thanks Jay, that's exactly what I was looking for. On 25 June 2014 04:18, Jay Kreps jay.kr...@gmail.com wrote: The primary relevant factor here is the fsync interval. Kafka's replication guarantees do not require fsyncing every message, so the reason for doing so is to handle correlated

Re: Kafka 'reassign-partitions' behavior if the replica does not catches up

2014-06-24 Thread Neha Narkhede
We have a JIRA to track the cancel feature - https://issues.apache.org/jira/browse/KAFKA-1506. Thanks, Neha On Tue, Jun 24, 2014 at 1:32 PM, Virendra Pratap Singh vpsi...@yahoo-inc.com.invalid wrote: In process of giving 0.8.1.1 a try. However I believe the question still holds true. If

Re: Kafka 0.8's VerifyConsumerRebalance reports an error

2014-06-24 Thread Neha Narkhede
I would turn on DEBUG on the tool to see which url it reads and doesn't find the owners. On Tue, Jun 24, 2014 at 11:28 AM, Yury Ruchin yuri.ruc...@gmail.com wrote: I've just double-checked. The URL is correct, the same one is used by Kafka clients. 2014-06-24 22:21 GMT+04:00 Neha

Re: Consumer offset is getting reset back to some old value automatically

2014-06-24 Thread Neha Narkhede
See the explanation from the zookeeper folks here https://zookeeper.apache.org/doc/r3.3.2/zookeeperAdmin.html - Because Zookeeper requires a majority, it is best to use an odd number of machines. For example, with four machines ZooKeeper can only handle the failure of a single machine; if two

Kafka 0.8/VIP/SSL

2014-06-24 Thread Reiner Stach
I'm looking for advice on running Kafka 0.8 behind VIPs. The goal is to support SSL traffic, with encryption and decryption being performed by back-to-back VIPs at the client and in front of the broker. That is: Kafka client -- vip1a.myco.com:8080 (SSL encrypt) --- WAN --- VIP 1b (SSL

Re: What's the behavior when Kafka is deleting messages and consumers are still reading

2014-06-24 Thread Lian, Li
GuoZhang, Thanks for explaining. I thought there might be such kind of lock mechanism but cannot confirm it in any documentation on Kafka website. It will be better if this could be written down in some Wiki or FAQ. Best regards, Lex Lian Email: ll...@ebay.com On 6/24/14, 11:17 PM,

Re: What's the behavior when Kafka is deleting messages and consumers are still reading

2014-06-24 Thread Guozhang Wang
I think we can probably update the documentation page for this update: https://kafka.apache.org/documentation.html#compaction On Tue, Jun 24, 2014 at 3:54 PM, Lian, Li ll...@ebay.com wrote: GuoZhang, Thanks for explaining. I thought there might be such kind of lock mechanism but cannot

Monitoring Producers at Large Scale

2014-06-24 Thread Bhavesh Mistry
We use Kafka as Transport Layer to transport application logs. How do we monitor Producers at large scales about 6000 boxes x 4 topic per box so roughly 24000 producers (spread across multiple data center.. we have brokers per DC). We do the monitoring based on logs. I have tried intercepting

Blacklisting Brokers

2014-06-24 Thread Lung, Paul
Hi All, Is there anyway to blacklist brokers? Sometimes we run into situations where there are certain hardware failures on a broker machine, and the machines goes into a “half dead” state. The broker process is up and participating in the cluster, but it can’t actually transmit messages

kafka.common.LeaderNotAvailableException

2014-06-24 Thread Zack Payton
Hi all, I have 3 zookeeper servers and 2 Kafka servers. Running Kafka version 0.8.1.1. Running zookeeper 3.3.5-cdh3u6. From the Kafka servers I can access the zookeeper servers on 2181. From one of the Kafka servers I can create a topic no problem: [root@kafka1 kafka-0.8.1.1-src]#

Re: kafka.common.LeaderNotAvailableException

2014-06-24 Thread Joe Stein
Are there any errors in the broker's logs? /*** Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop

Re: kafka.common.LeaderNotAvailableException

2014-06-24 Thread Zack Payton
server.log has a lot of these errors: [2014-06-24 20:07:16,124] ERROR [KafkaApi-6] error when handling request Name: FetchRequest; Version: 0; CorrelationId: 81138; ClientId: ReplicaFetche rThread-0-5; ReplicaId: 6; MaxWait: 500 ms; MinBytes: 1 bytes; RequestInfo: [test1,0] -

Re: Uneven distribution of kafka topic partitions across multiple brokers

2014-06-24 Thread Joe Stein
Take a look at bin/kafka-reassign-partitions.sh Option Description -- --- --broker-list brokerlist The list of brokers to which the partitions need to be

Getting java.io.IOException: Too many open files

2014-06-24 Thread Lung, Paul
Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following error messages on the same 3 brokers once in a while: [2014-06-24 21:43:44,711] ERROR Error in acceptor (kafka.network.Acceptor) java.io.IOException: Too many open files at

Re: Getting java.io.IOException: Too many open files

2014-06-24 Thread Prakash Gowri Shankor
How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014 at 10:18 PM, Lung, Paul pl...@ebay.com wrote: Hi All, I just upgraded my cluster from 0.8.1 to 0.8.1.1. I’m seeing the following error messages on the same 3 brokers

Re: Getting java.io.IOException: Too many open files

2014-06-24 Thread Lung, Paul
The controller machine has 3500 or so, while the other machines have around 1600. Paul Lung On 6/24/14, 10:31 PM, Prakash Gowri Shankor prakash.shan...@gmail.com wrote: How many files does each broker itself have open ? You can find this from 'ls -l /proc/processid/fd' On Tue, Jun 24, 2014