Re: 0.7 design doc?

2015-03-02 Thread Philip O'Toole
On Saturday, February 28, 2015 9:33 PM, Guozhang Wang wangg...@gmail.com wrote: Is this you are looking for? http://kafka.apache.org/07/documentation.html On Fri, Feb 27, 2015 at 7:02 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: There used to be available a very lucid page

0.7 design doc?

2015-02-27 Thread Philip O'Toole
There used to be available a very lucid page describing Kafka 0.7, its design, and the rationale behind certain decisions. I last saw it about 18 months ago. I can't find it now. Is it still available? I can find the 0.8 version, it's up there on the site. Any help? Any links? Philip

Re: AWS EC2 deployment best practices

2014-09-30 Thread Philip O'Toole
. To answer your question, I was thinking ephemerals with replication, yes. With a reservation, it's pretty easy to get e.g. two i2.xlarge for an amortized cost below a single m2.2xlarge with the same amount of EBS storage and provisioned IOPs. On Mon, Sep 29, 2014 at 9:40 PM, Philip O'Toole philip.oto

Re: AWS EC2 deployment best practices

2014-09-29 Thread Philip O'Toole
If only Kafka had rack awarenessyou could run 1 cluster and set up the replicas in different AZs. https://issues.apache.org/jira/browse/KAFKA-1215 As for your question about ephemeral versus EBS, I presume you are proposing to use ephemeral *with* replicas, right? Philip

Re: Use case

2014-09-05 Thread Philip O'Toole
Yes, IMHO, that is going to be way too many topics. Use a smaller number of topics, and embedded attributes like tag and user in the messages written to Kafka. Phiilp - http://www.philipotoole.com On Friday, September 5, 2014 4:21 AM, Sharninder

Re: Use case

2014-09-04 Thread Philip O'Toole
Agreed. I can't see this being a good use for Kafka. Philip - http://www.philipotoole.com On Thursday, September 4, 2014 9:57 PM, Sharninder sharnin...@gmail.com wrote: Since you want all chats and mail history persisted all the time, I personally

Re: High Level Consumer and Commit

2014-09-03 Thread Philip O'Toole
thread (single ConsumerStream) and use commitOffset API to commit all partitions managed by each ConsumerConnector after the thread finished processing the messages. Does that solve the problem, Bhavesh? Gwen On Tue, Sep 2, 2014 at 5:47 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: Yeah

Re: High Level Consumer and Commit

2014-09-03 Thread Philip O'Toole
Only problem is no of connections to Kafka is increased. *Why* is it a problem? Philip

Re: High Level Consumer and Commit

2014-09-02 Thread Philip O'Toole
Either use the SimpleConsumer which gives you much finer-grained control, or (this worked with 0.7) spin up a ConsumerConnection (this is a HighLevel consumer concept) per partition, turn off auto-commit. Philip - http://www.philipotoole.com On

Re: High Level Consumer and Commit

2014-09-02 Thread Philip O'Toole
only when batch is done. Thanks, Bhavesh On Tue, Sep 2, 2014 at 4:43 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: Either use the SimpleConsumer which gives you much finer-grained control, or (this worked with 0.7) spin up a ConsumerConnection (this is a HighLevel consumer

Re: High Level Consumer and Commit

2014-09-02 Thread Philip O'Toole
management. Thanks, Bhavesh On Tue, Sep 2, 2014 at 5:20 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: No, you'll need to write your own failover. I'm not sure I follow your second question, but the high-level Consumer should be able to do what you want if you disable auto

Re: How retention is working

2014-08-25 Thread Philip O'Toole
Retention is per topic, per Kafka broker, it is nothing to do with the Producer. You do not need to restart the Producer for retention changes to take effect. You do, however, need to restart the broker however. Once restarted, all messages will then be subject to the new policy. Philip  

Re: Data inputs for Kafka.

2014-08-20 Thread Philip O'Toole
Kafka can ingest any kind of data, and connect to many types of systems. Much work exists in this area already, for hooking a wide variety of systems to Kafka. If your system isn't supported, then you write a Kafka Producer to pull (or receive) messages from your system, and write them to

Re: Which is better?

2014-08-20 Thread Philip O'Toole
If you have studied the docs yet, you should, as this is a broad question which needs background to understand the answer. But in summary, the high-level Consumer does more for you, and importantly, provides balancing between Consumers. The SimpleConsumer does less for you, but gives you more

Re: Keep on getting kafka.common.OffsetOutOfRangeException: Random times

2014-08-20 Thread Philip O'Toole
It's not a bug, right? It's the way the system works (if I have been following the thread correctly) -- when the retention time passes, the message is gone. Either consume your messages sooner, or increase your retention time. Kafka is not magic, it can only do what it's told. In practise I

Re: Announce: Capillary, a monitor for Kafka 0.8 spout topologies -- and the upgrade to 0.8

2014-08-20 Thread Philip O'Toole
! On Wed, Aug 20, 2014 at 10:04 AM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: Nice work. That tool I put together was getting a bit old. :-) I updated the Kafka ecosystem page with details of both tools. https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Philip

Re: Using kafka in non million users environment

2014-08-19 Thread Philip O'Toole
- we have a low data traffic compared to your figures: around 30 GB a day. Will it be an issue? I have personal experience that Kafka deals extremely well with very low-volumes, as well as very high. I have used Kafka for small integration-test setups, as well as large production systems.

Re: Issue with 240 topics per day

2014-08-12 Thread Philip O'Toole
Todd -- can you share details of the ZK cluster you are running, to support this scale? Is it one single Kafka cluster? Are you using 1 single ZK cluster? Thanks, Philip   - http://www.philipotoole.com On Monday, August 11, 2014 9:32 PM, Todd Palino

Re: Issue with 240 topics per day

2014-08-11 Thread Philip O'Toole
I'd love to know more about what you're trying to do here. It sounds like you're trying to create topics on a schedule, trying to make it easy to locate data for a given time range? I'm not sure it makes sense to use Kafka in this manner. Can you provide more detail? Philip  

Re: Issue with 240 topics per day

2014-08-11 Thread Philip O'Toole
On Mon, Aug 11, 2014 at 5:01 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: I'd love to know more about what you're trying to do here. It sounds like you're trying to create topics on a schedule, trying to make it easy to locate data for a given time range? I'm not sure it makes sense

Re: Issue with 240 topics per day

2014-08-11 Thread Philip O'Toole
another queue system. Chen Chen On Mon, Aug 11, 2014 at 6:07 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: It's still not clear to me why you need to create so many topics. Write the data to a single topic and consume it when it arrives. It doesn't matter if it arrives

Re: Issue with 240 topics per day

2014-08-11 Thread Philip O'Toole
cluster, with the hope that the topic deletion api will be available soon.Meantime just have a cron job cleaning up the outdated topics from zookeeper. Let me know what you think, Thanks, Chen On Mon, Aug 11, 2014 at 6:53 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: Why

Re: consumer rebalance weirdness

2014-08-07 Thread Philip O'Toole
I think the question is what in your consuming application could cause it not to check in with ZK for longer than the timeout.   - http://www.philipotoole.com On Thursday, August 7, 2014 8:16 AM, Jason Rosenberg j...@squareup.com wrote: Well, it's

Re: consumer rebalance weirdness

2014-08-07 Thread Philip O'Toole
A big GC pause in your application, for example, could do it. Philip   - http://www.philipotoole.com On Thursday, August 7, 2014 11:56 AM, Philip O'Toole philip.oto...@yahoo.com wrote: I think the question is what in your consuming application

Re: Apache webserver access logs + Kafka producer

2014-08-07 Thread Philip O'Toole
Fluentd might work or simply configure rsyslog or syslog-ng on the box to watch the Apache log files, and send them to a suitable Producer (for example I wrote something that will accept messages from a syslog client, and stream them to Kafka.  https://github.com/otoolep/syslog-gollector)

Re: consumer rebalance weirdness

2014-08-07 Thread Philip O'Toole
listeners are in separate async threads (and that's what it looks like looking at the kafka consumer code). Maybe I should increase the zk session timeout and see if that helps. On Thu, Aug 7, 2014 at 2:56 PM, Philip O'Toole philip.oto...@yahoo.com.invalid wrote: A big GC pause in your

Re: Interested in contributing to Kafka?

2014-07-18 Thread Philip O'Toole
On Thu, Jul 17, 2014 at 9:28 PM, Philip O'Toole philip_o_to...@yahoo.com.invalid wrote: First things first. I friggin' think Kafka rocks. It's a system that have given me a lot of joy, and I've spent a lot of fun hours (and sometimes not so fun) looking at consumer lag metrics. I'd like

Re: Interested in contributing to Kafka?

2014-07-17 Thread Philip O'Toole
First things first. I friggin' think Kafka rocks. It's a system that have given me a lot of joy, and I've spent a lot of fun hours (and sometimes not so fun) looking at consumer lag metrics. I'd like to give back, beyond spreading the gospel about it architecturally and operationally. My only

Re: Has anybody successfully integrated Kafka jar for Android app.

2014-07-16 Thread Philip O'Toole
FWIW, I happen to know Subodh -- we worked together many years back. We discussed this a little off-the-list, but perhaps my thoughts might be of wider interest. Kafka, in my experience, works best when Producers have a persistent TCP connection to the Broker(s) (and possibly Zookeeper). I

Re: Has anybody successfully integrated Kafka jar for Android app.

2014-07-16 Thread Philip O'Toole
You should find code here that will help you get a HTTP app server together, that writes to Kafka on the back-end. https://cwiki.apache.org/confluence/display/KAFKA/Clients https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem   On Wednesday, July 16, 2014 9:36 PM, Philip O'Toole

Syslog Collector to Kafka 0.8 -- in Go

2014-07-11 Thread Philip O'Toole
I went looking for a Syslog Collector, written in Go, which would stream to Kafka. I couldn't find any, so put one together myself -- others might be interested. It optionally performs basic parsing of an RFC5424 header too, before streaming the messages to Kafka. As always, YMMV.

Re: 0.72 Consumer: message is invalid, compression codec: NoCompressionCodec

2014-02-11 Thread Philip O'Toole
On Feb 11, 2014, at 7:45 AM, Jun Rao jun...@gmail.com wrote: We do catch the exception. However, we don't know what to do with it. Retrying may not fix the problem. So, we just log it and let the thread die. Thanks, Jun On Mon, Feb 10, 2014 at 8:42 PM, Philip O'Toole phi...@loggly.com

0.72 Consumer: message is invalid, compression codec: NoCompressionCodec

2014-02-10 Thread Philip O'Toole
Saw this thrown today, which brought down a Consumer thread -- we're using Consumers built on the High-level consumer framework. What may have happened here? We are using a custom C++ Producer which does not do compression, and which hasn't changed in months, but this error is relatively new to

Re: 0.72 Consumer: message is invalid, compression codec: NoCompressionCodec

2014-02-10 Thread Philip O'Toole
I should we *think* this exception brought down the Consumer thread. The problematic partition on our system was 2-29, so this is definitely the related thread. Philip On Mon, Feb 10, 2014 at 5:00 PM, Philip O'Toole phi...@loggly.com wrote: Saw this thrown today, which brought down a Consumer

Re: 0.72 Consumer: message is invalid, compression codec: NoCompressionCodec

2014-02-10 Thread Philip O'Toole
validation failed. Is there any issue with the network? Thanks, Jun On Mon, Feb 10, 2014 at 5:00 PM, Philip O'Toole phi...@loggly.com wrote: Saw this thrown today, which brought down a Consumer thread -- we're using Consumers built on the High-level consumer framework. What may have

Re: C++ Producer = Broker = Java Consumer?

2014-01-31 Thread Philip O'Toole
Is this a Kafka C++ lib you wrote yourself, or some open-source library? What version of Kafka? Philip On Fri, Jan 31, 2014 at 1:30 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, If Kafka Producer is using a C++ Kafka lib to produce messages, how can Kafka Consumers written in

Re: C++ Producer = Broker = Java Consumer?

2014-01-31 Thread Philip O'Toole
++ program writes bytes to kafka, and java reads bytes from kafka. Is there something special about the way the messages are being serialized in C++? --Tom On Fri, Jan 31, 2014 at 2:36 PM, Philip O'Toole phi...@loggly.com wrote: Is this a Kafka C++ lib you wrote yourself, or some open-source

Re: How to design a robust producer?

2014-01-30 Thread Philip O'Toole
and data will be lost. Do you have by any chance a pointer to an existing implementation of a such producer? Thanks Le 30 janv. 2014 à 15:13, Philip O'Toole phi...@loggly.com a écrit : What exactly are you struggling with? Your question is too broad. What you want to do is eminently possible

max.message.size, 0.7 SyncProducer, and compression

2014-01-28 Thread Philip O'Toole
http://kafka.apache.org/07/configuration.html Hello -- I can look the at the code too, but how does this setting interact with compression? After all, a Producer doing compression doesn't know the size of a message on the wire it will send to a Kafka broker until after it has been compressed. And

Re: Consumer Events Problem

2013-12-09 Thread Philip O'Toole
What version are you running? Philip On Mon, Dec 9, 2013 at 4:30 AM, Sanket Maru san...@ecinity.com wrote: I am working on a small project and discovered that our consumer hasn't been executed for over a month now. How can i check the unprocessed events ? From which date the events are

Re: Consumer Events Problem

2013-12-09 Thread Philip O'Toole
OK, I am only familiar with 0.72. Philip On Mon, Dec 9, 2013 at 4:54 AM, Sanket Maru san...@ecinity.com wrote: I am using kafka 0.8.0 On Mon, Dec 9, 2013 at 6:09 PM, Philip O'Toole phi...@loggly.com wrote: What version are you running? Philip On Mon, Dec 9, 2013 at 4:30 AM

Re: Consuming backwards?

2013-12-06 Thread Philip O'Toole
Take apart the hard disk, and flip the magnets in the motors so it spins in reverse. The Kafka software won't be any the wiser. That should give you exactly what you need, combined with high-performance sequential reads. :-D On Dec 6, 2013, at 7:43 AM, Joe Stein joe.st...@stealth.ly wrote:

Preventing topics from appearing on specific brokers

2013-12-05 Thread Philip O'Toole
Hello, Say we are using Zookeeper-based Producers, and we specify a topic to be written to. Since we don't specify the actual brokers, is there a way to prevent a topic from appearing on a specific broker? What if we set the topic's partition count to 0 on the broker we don't want it to appear?

Re: Preventing topics from appearing on specific brokers

2013-12-05 Thread Philip O'Toole
We're running 0.72. Thanks, Philip On Thu, Dec 5, 2013 at 4:29 PM, Philip O'Toole phi...@loggly.com wrote: Hello, Say we are using Zookeeper-based Producers, and we specify a topic to be written to. Since we don't specify the actual brokers, is there a way to prevent a topic from

Re: Preventing topics from appearing on specific brokers

2013-12-05 Thread Philip O'Toole
, if a topic already exists on at least one broker in a cluster, it won't be created on newly added brokers. Thanks, Jun On Thu, Dec 5, 2013 at 4:29 PM, Philip O'Toole phi...@loggly.com wrote: Hello, Say we are using Zookeeper-based Producers, and we specify a topic to be written

Re: Preventing topics from appearing on specific brokers

2013-12-05 Thread Philip O'Toole
Sweet -- thanks Jun. On Thu, Dec 5, 2013 at 9:25 PM, Jun Rao jun...@gmail.com wrote: That's right. Remove the local log dir from brokers that you don't want to have the topic. Thanks, Jun On Thu, Dec 5, 2013 at 9:22 PM, Philip O'Toole phi...@loggly.com wrote: Interesting. So if we

Re: Is there a way to get the offset of a consumer of a topic?

2013-12-04 Thread Philip O'Toole
Simple tool I wrote to monitor 0.7 consumers. https://github.com/otoolep/stormkafkamon On Wed, Dec 4, 2013 at 12:49 PM, David DeMaagd ddema...@linkedin.comwrote: You can use either the MaxLag MBean (0.8): http://kafka.apache.org/documentation.html#monitoring Or the ConsumerOffsetChecker

Re: Loggly's use of Kafka on AWS

2013-12-03 Thread Philip O'Toole
. On Sun, Dec 1, 2013 at 9:59 PM, Joe Stein joe.st...@stealth.ly wrote: Awesome Philip, thanks for sharing! On Sun, Dec 1, 2013 at 9:17 PM, Philip O'Toole phi...@loggly.com wrote: A couple of us here at Loggly recently spoke at AWS reinvent, on how we use Kafka 0.72 in our ingestion

Loggly's use of Kafka on AWS

2013-12-01 Thread Philip O'Toole
A couple of us here at Loggly recently spoke at AWS reinvent, on how we use Kafka 0.72 in our ingestion pipeline. The slides are at the link below, and may be of interest to people on this list.

Re: kafka producer - retry messages

2013-11-28 Thread Philip O'Toole
By FS I guess you mean file system. In that case, if one is that concerned, why not run a single Kafka broker on the same machine, and connect to it over localhost? And disable ZK mode too, perhaps. I may be missing something, but I never fully understand why people try really hard to build

Re: kafka producer - retry messages

2013-11-28 Thread Philip O'Toole
There are many options. Another simple consumer could read from it to a second broker. Philip On Nov 28, 2013, at 4:18 PM, Steve Morin steve.mo...@gmail.com wrote: Philip, How would do you mirror this to a main Kafka instance? -Steve On Nov 28, 2013, at 16:14, Philip O'Toole phi

Re: kafka producer - retry messages

2013-11-28 Thread Philip O'Toole
...@gmail.com wrote: Philip, what about if the broker goes down? I may be missing something. Diego. El 28/11/2013 21:09, Philip O'Toole phi...@loggly.com escribió: By FS I guess you mean file system. In that case, if one is that concerned, why not run a single Kafka broker on the same

Re: kafka producer - retry messages

2013-11-28 Thread Philip O'Toole
code structured? Have you open sourced it? On Nov 28, 2013, at 16:08, Philip O'Toole phi...@loggly.com wrote: By FS I guess you mean file system. In that case, if one is that concerned, why not run a single Kafka broker on the same machine, and connect to it over localhost? And disable

Re: 0.72 Kafka - ZK over VPN

2013-11-27 Thread Philip O'Toole
a new ZK session and new connections to the brokers. Thanks, Jun On Tue, Nov 26, 2013 at 9:33 PM, Philip O'Toole phi...@loggly.com wrote: I want to use a ZK cluster for my Kafka cluster, which is only available over a cross-country VPN tunnel. The VPN tunnel is prone to resets, every

Re: 0.72 Kafka - ZK over VPN

2013-11-27 Thread Philip O'Toole
have the logic for handling ZK session expirations. So, they should recover automatically. The issue is that if there is a real failure in the broker/consumer while the VPN is down, the failure may not be detected. Thanks, Jun On Wed, Nov 27, 2013 at 8:02 AM, Philip O'Toole phi

0.72 Kafka - ZK over VPN

2013-11-26 Thread Philip O'Toole
I want to use a ZK cluster for my Kafka cluster, which is only available over a cross-country VPN tunnel. The VPN tunnel is prone to resets, every other day or so, perhaps down for a couple of minutes at a time. Is this a concern? Any setting changes I should make to mitigate any potential

Re: kafka.common.OffsetOutOfRangeException

2013-11-19 Thread Philip O'Toole
. On Tue, Nov 19, 2013 at 11:51 AM, Philip O'Toole phi...@loggly.com wrote: Don't get scared, this if perfectly normal and easily fixed. :-) The second topology attempted to fetch messages from an offset in Kafka that does not exists. This could happen due to Kafka retention policies (messages

Re: kafka.common.OffsetOutOfRangeException

2013-11-18 Thread Philip O'Toole
Don't get scared, this if perfectly normal and easily fixed. :-) The second topology attempted to fetch messages from an offset in Kafka that does not exists. This could happen due to Kafka retention policies (messages deleted) or a bug in your code. Your code needs to catch this exception, and

Why would one choose a partition when producing?

2013-11-05 Thread Philip O'Toole
We use 0.72 -- I am not sure if this matters with 0.8. Why would one choose a partition, as opposed to a random partition choice? What design pattern(s) would mean choosing a partition? When is it a good idea? Any feedback out there? Thanks, Philip

Re: New error on on 0.72 Kafka brokers

2013-10-31 Thread Philip O'Toole
D'oh. Bad config on our part. Something we thought we had fixed long ago, but it crept back in. Make sure fetch sizes are big enough! Philp On Oct 31, 2013, at 7:18 PM, Philip O'Toole phi...@loggly.com wrote: We suddenly started seeing these messages from one our consumers tonight. What

Re: Why Apache Kafka is bettern than any other Message System?

2013-10-30 Thread Philip O'Toole
On Wed, Oct 30, 2013 at 8:13 PM, Lee, Yeon Ok (이연옥) yeono...@ebay.comwrote: Hi, all. I just got curiosity why Apache Kafka is better than any other Message System in terms of throughput, and durability. Because it's brilliant, that's why. :-) What’s the fact to let Kafka have better

Re: How to commit offset every time, automatically

2013-10-27 Thread Philip O'Toole
You have two choices. -- Do what you say, and write your own consumer, based on the SimpleConsumer. Handle all commits, ZK accesses, and balancing yourself. -- Use a ConsumerConnector for every partition, and call commitOffsets() explicitly when you have processed a message. This does a commit

Re: Is 30 a too high partition number?

2013-10-08 Thread Philip O'Toole
I would like to second that. It would be real useful. Philip On Oct 8, 2013, at 9:31 AM, Jason Rosenberg j...@squareup.com wrote: What I would like to see is a way for inactive topics to automatically get removed after they are inactive for a period of time. That might help in this case.

Re: Strategies for improving Consumer throughput

2013-10-02 Thread Philip O'Toole
Is this with 0.7 or 0.8? On Wed, Oct 2, 2013 at 12:59 PM, Joe Stein crypt...@gmail.com wrote: Are you sure the consumers are behind? could the pause be because the stream is empty and producing messages is what is behind the consumption? What if you shut off your consumers for 5 minutes and

Re: Topic messages with partitions=1 stored on multiple brokers

2013-09-21 Thread Philip O'Toole
) that both the brokers create a new topic log for the same topic. The brokers are in different availability zones. Does that matter? Suchi On Fri, Sep 20, 2013 at 4:20 PM, Philip O'Toole phi...@loggly.com wrote: Seems to me you are confusing partitions and brokers. Partition count has

Re: Topic messages with partitions=1 stored on multiple brokers

2013-09-21 Thread Philip O'Toole
/consumer use zookeeper to discover brokers. I can clearly see in the logs(brokers) that both the brokers create a new topic log for the same topic. The brokers are in different availability zones. Does that matter? Suchi On Fri, Sep 20, 2013 at 4:20 PM, Philip O'Toole phi...@loggly.com wrote

Re: Topic messages with partitions=1 stored on multiple brokers

2013-09-20 Thread Philip O'Toole
Seems to me you are confusing partitions and brokers. Partition count has nothing to do with the number of brokers to which a message a sent -- just the number of partitions into which that message will be split when it gets to a broker. You need to explicitly set the destination brokers in the

Re: Thanks for Kafka

2013-09-09 Thread Philip O'Toole
:49 PM, Philip O'Toole phi...@loggly.com wrote: Hello Kafka users and developers, We at Loggly launched our new system last week, and Kafka is a critical part. I just wanted to say a sincere thank-you to the Kafka team at LinkedIn who put this software together. It's really, really great

Re: Thanks for Kafka

2013-09-09 Thread Philip O'Toole
guys using 0.7 or 0.8? Jun On Mon, Sep 9, 2013 at 12:49 PM, Philip O'Toole phi...@loggly.com wrote: Hello Kafka users and developers, We at Loggly launched our new system last week, and Kafka is a critical part. I just wanted to say a sincere thank-you to the Kafka team at LinkedIn who

Re: Question on # of partitions

2013-08-29 Thread Philip O'Toole
It means the first. Philip On Thu, Aug 29, 2013 at 8:55 AM, Mark static.void@gmail.com wrote: If I have 3 brokers with 3 partitions does that mean: 1) I have 3 partitions per broker so I can have up to 9 consumers or 2) There is only 1 partition per brokers which means I can have

Re: Producer/Consumer questions 0.7

2013-08-29 Thread Philip O'Toole
On Thu, Aug 29, 2013 at 11:11 AM, Mark static.void@gmail.com wrote: Also, are the consumer offsets store in Kafka or Zookeeper? Zookeeper. On Aug 29, 2013, at 11:09 AM, Mark static.void@gmail.com wrote: 1) Should a producer be aware of which broker to write to or is this

Re: consumer question

2013-08-23 Thread Philip O'Toole
Yes, the Kafka team has told me that this is how it works (at least for 0.72). Philip On Fri, Aug 23, 2013 at 7:53 AM, Yu, Libo libo...@citi.com wrote: Hi team, Right now, from a stream, an iterator can be obtained which has a blocking hasNext(). So what is the implementation behind the

Re: Producer message ordering problem

2013-08-22 Thread Philip O'Toole
I am curious. What is it about your design that requires you track order so tightly? Maybe there is another way to meet your needs instead of relying on Kafka to do it. Philip On Aug 22, 2013, at 9:32 PM, Ross Black ross.w.bl...@gmail.com wrote: Hi, I am using Kafka 0.7.1, and using the

Re: Best partition configuration

2013-08-21 Thread Philip O'Toole
brokers in production environments, giving us a total of 24 partitions. Throughput has been superb. For integration testing however, we usually use just 1 or 2 partitions. Philip Thanks in advance! --Tom -- Philip O'Toole Senior Developer Loggly, Inc. San Francisco, CA. www.loggly.com Come

Re: Best partition configuration

2013-08-21 Thread Philip O'Toole
1 topic. I don't understand the second question. Philip On Aug 21, 2013, at 9:52 AM, Tom Brown tombrow...@gmail.com wrote: Philip, How many topics per broker (just one?) And what is the read/write profile of your setup? --Tom On Wed, Aug 21, 2013 at 12:24 PM, Philip O'Toole phi

Re: ordering

2013-08-21 Thread Philip O'Toole
No, there isn't, not at the very start when there is no state in Zookeeper. Once there is state the Kafka team have told me that rebalancing will not result in any dupes. However, if there is no state in Zookeeper and your partitions are empty, simply wait until all consumers have balanced before

Re: Lost of messages at C++ Kafka client

2013-08-07 Thread Philip O'Toole
If I understand what you are asking, I have dealt successfully with the same type of issue. It can take more than one Boost async_write() over a broken connection before the client software notices that the connection is gone. The best way to detect if a connection is broken is not by detecting

Re: Partitions per topic per broker?

2013-07-25 Thread Philip O'Toole
You set the partition-count to 100 per broker. 3 brokers. 300 partitions total. Philip On Thu, Jul 25, 2013 at 11:29 AM, Ian Friedman i...@flurry.com wrote: Hi guys, apologies in advance for the newb question: I am running a 3 broker setup, and I have a topic configured with 100 partitions

Re: Duplicate Messages on the Consumer

2013-07-18 Thread Philip O'Toole
Have you actually examined the Kafka files on disk, to make sure those dupes are really there? Or is this a case of reading the same message more than once? Philip On Thu, Jul 18, 2013 at 8:55 AM, Sybrandy, Casey casey.sybra...@six3systems.com wrote: Hello, We recently started seeing

New exception (for us) in high-level consumer code -- OK to ignore?

2013-07-10 Thread Philip O'Toole
Hello -- we're doing some heavy lifting now with our high-level based consumer. We open a Consumer Connection per partition within the one JVM, and are using Kafka 0.72. We saw a burst of the exceptions shown below. Is this something we should be concerned about? Or is this the normal output from

Re: New exception (for us) in high-level consumer code -- OK to ignore?

2013-07-10 Thread Philip O'Toole
down the consumer. Is that the case? Thanks, Jun On Wed, Jul 10, 2013 at 6:43 PM, Philip O'Toole phi...@loggly.com wrote: Hello -- we're doing some heavy lifting now with our high-level based consumer. We open a Consumer Connection per partition within the one JVM, and are using Kafka

Re: High Level Consumer error handling and clean exit

2013-07-09 Thread Philip O'Toole
It seems like you're not explicitly controlling the offsets. Is that correct? If so, the moment you pull a message from the stream, the client framework considers it processed. So if your app subsequently crashes before the message is fully processed, and auto-commit updates the offsets in

Re: High Level Consumer error handling and clean exit

2013-07-09 Thread Philip O'Toole
' call to ConsumerConnector is made. Thanks, Chris On Tue, Jul 9, 2013 at 11:21 AM, Philip O'Toole phi...@loggly.com wrote: It seems like you're not explicitly controlling the offsets. Is that correct? If so, the moment you pull a message from the stream, the client framework

Re: Is it possible to get latest offset from kafka server?

2013-06-29 Thread Philip O'Toole
Of course -- make an Offset Request. This can be done in many ways, Java, Python, C++, Ruby. -1 means get latest offset, if I remember correctly. http://people.apache.org/~joestein/kafka-0.7.1-incubating-docs/ It's just bytes on the wire, and bytes come back. Philip On Sat, Jun 29, 2013 at

Re: about restarting kafka server

2013-06-20 Thread Philip O'Toole
Are you just killing Kafka, or Zookeeper too? On Jun 20, 2013, at 8:59 AM, Yu, Libo libo...@citi.com wrote: Hi, I have kafka running on a three host cluster. I have a script that can automatically start zookeepers on all three hosts and then start kafka servers on them. It can also kill

Re: Facebook like newsfeed using kafka

2013-06-18 Thread Philip O'Toole
You should consider using it regardless. I find 0.72 to be a great system, which is well designed and reliable. As for Storm, it depends. If you just want a simple pub-sub queue system, probably not. Philip On Jun 18, 2013, at 6:48 AM, Piyush Rai piyushra...@gmail.com wrote: I am trying

Re: message order, guarenteed?

2013-06-14 Thread Philip O'Toole
Another idea. If a set of messages arrive over a single TCP connection, route to a partition depending on TCP connection. To be honest, these approaches, while they work, may not scale when the message rate is high. If at all possible, try to think of a way to remove this requirement from your

Re: Amazon SNS and Kafka comparison

2013-06-14 Thread Philip O'Toole
Depends how important being able to access every single bit of the messages are, right down to looking at what is on the disk. It's very important to us, we need that control. Ability to scale throughout as needed is also important - too important to do anything but run it ourselves. All these

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-14 Thread Philip O'Toole
] Philip Thanks, Jun On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole phi...@loggly.com wrote: Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced? It seems

Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-13 Thread Philip O'Toole
at some point. This will block the fetcher from putting the data into the other queue. Thanks, Jun On Wed, Jun 12, 2013 at 9:10 PM, Philip O'Toole phi...@loggly.com wrote: Jun -- thanks. But if the topic is the same, doesn't each thread get a partition? Isn't that how it works

Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced? It seems that the first ConsumerConnector to come up is prefetching some data, and we end up with duplicate

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
at 7:34 PM, Philip O'Toole phi...@loggly.com wrote: Hello -- is it possible for our code to stall a ConsumerConnector from doing any consuming for, say, 30 seconds, until we can be sure that all other ConsumeConnectors are rebalanced? It seems that the first ConsumerConnector to come up

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-13 Thread Philip O'Toole
dups are expected during rebalance. In 0.8, such dups are eliminated. Other than that, rebalance shouldn't cause dups since we commit consumed offsets to ZK before doing a rebalance. Thanks, Jun On Thu, Jun 13, 2013 at 7:34 PM, Philip O'Toole phi...@loggly.com wrote: Hello

One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
Hello -- we're using 0.72. We're looking at the source, but want to be sure. :-) We create a single ConsumerConnector, call createMessageStreams, and hand the streams off to individual threads. If one of those threads calls next() on a stream, gets some messages, and then *blocks* in some

Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
data getting into the consumer for topic 2. Thanks, Jun On Wed, Jun 12, 2013 at 7:43 PM, Philip O'Toole phi...@loggly.com wrote: Hello -- we're using 0.72. We're looking at the source, but want to be sure. :-) We create a single ConsumerConnector, call createMessageStreams, and hand

Re: One 0.72 ConsumerConnector, multiple threads, 1 blocks. What happens?

2013-06-12 Thread Philip O'Toole
12, 2013 at 9:10 PM, Philip O'Toole phi...@loggly.com wrote: Jun -- thanks. But if the topic is the same, doesn't each thread get a partition? Isn't that how it works? Philip On Wed, Jun 12, 2013 at 9:08 PM, Jun Rao jun...@gmail.com wrote: Yes, when the consumer is consuming multiple topics

Re: log file corruption when reading older messages

2013-06-10 Thread Philip O'Toole
We often replay data days old, and have never seen any issues like this. We are running 0.72. Philip On Mon, Jun 10, 2013 at 11:17 AM, Todd Bilsborrow tbilsbor...@rhythmnewmedia.com wrote: We've been running Kafka 0.7.0 in production for several months and have been quite happy. Our use case

What exactly happens if fetch size is smaller than the next batch (0.72 and high-level consumer)

2013-06-01 Thread Philip O'Toole
Hello -- I'll try to look at the code, but I'm seeing something here and I want to be *sure* I'm correct. Say a batch sitting in a 0.72 partition is, say, 5MB in size. An instance of a high-level consumer has a configured fetch size of 300KB. This actually becomes the maxSize value, right, in

Re: What exactly happens if fetch size is smaller than the next batch (0.72 and high-level consumer)

2013-06-01 Thread Philip O'Toole
, Neha On Fri, May 31, 2013 at 11:25 PM, Philip O'Toole phi...@loggly.com wrote: Hello -- I'll try to look at the code, but I'm seeing something here and I want to be *sure* I'm correct. Say a batch sitting in a 0.72 partition is, say, 5MB in size. An instance of a high-level consumer has

Re: Relationship between Zookeeper and Kafka

2013-05-21 Thread Philip O'Toole
As a test, why not just use a disk with provisioned IOPs of 4000? Just as a test - see if it improves. Also, you have not supplied any metrics regarding the VM's performance. Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop, and tell us what you find. Philip On May 20,

  1   2   >