Re: Using Kafka for data messages

2013-06-14 Thread Mahendra M
Hi Josh, Thanks for clarifying the use case. The idea is good, but I see the following three issues 1. Creating a queue for each user. There could be limits on this 2. Removing old queues 3. If the same user logs in from multiple browsers, things get a bit more complex. Can I

Re: Using Kafka for data messages

2013-06-14 Thread Archie Cowan
Hi there, I'm very new to Kafka but am also keen on this use case. Is the number of topics just limited to the underlying filesystems constraints on number of files in 1 directory? There are other filesystems out there that have practical limits in the range of millions (though programs like

Re: Versioning Schema's

2013-06-14 Thread David Arthur
I've done this in the past, and it worked out well. Stored Avro schema in ZooKeeper with an integer id and prefixed each message with the id. You have to make sure when you register a new schema that it resolves with the current version (ResolvingDecoder helps with this). -David On 6/13/13

Re: message order, guarenteed?

2013-06-14 Thread David Arthur
Simple example of how to take advantage of this behavior: Suppose you're sending document updates through Kafka. If you use the document ID as the message key and the default hash partitioner, the updates for a given document will exist on the same partition and come into the consumer in

Amazon SNS and Kafka comparison

2013-06-14 Thread James Newhaven
Hi, I have a system that needs to process tens of thousands of user events per second. I've looked at both Kafka and Amazon SNS. Using SNS would mean I can avoid the operational overhead of maintaining Kafka and Zookeeper installations and monitoring. I also wouldn't need to worry about storage

Re: message order, guarenteed?

2013-06-14 Thread Philip O'Toole
Another idea. If a set of messages arrive over a single TCP connection, route to a partition depending on TCP connection. To be honest, these approaches, while they work, may not scale when the message rate is high. If at all possible, try to think of a way to remove this requirement from your

Re: Amazon SNS and Kafka comparison

2013-06-14 Thread Philip O'Toole
Depends how important being able to access every single bit of the messages are, right down to looking at what is on the disk. It's very important to us, we need that control. Ability to scale throughout as needed is also important - too important to do anything but run it ourselves. All these

0.8 backup strategy anyone?

2013-06-14 Thread Scott Clasen
So despite 0.8 being a release that will give much higher availability do people do anything at all to back up the data? For instance if any of you are running on EC2 and using ephemeral disk for perf reasons, what do you do about messages that you absolutely cant afford to lose. Basically

Re: Kafka 0.8 Maven and IntelliJ

2013-06-14 Thread Dragos Manolescu
I use 12.1.4 Ultimate on OS X. -Dragos On 6/13/13 9:07 PM, Jun Rao jun...@gmail.com wrote: Thanks. Which version of Intellij are you using? Jun On Thu, Jun 13, 2013 at 10:20 AM, Dragos Manolescu dragos.manole...@servicenow.com wrote: Hmm, I've just pulled 0.8.0-beta1-candidate1, removed

Re: Stall high-level 0.72 ConsumerConnector until all balanced? Avoid message dupes?

2013-06-14 Thread Philip O'Toole
On Thu, Jun 13, 2013 at 9:15 PM, Jun Rao jun...@gmail.com wrote: Are you messages compressed in batches? If so, some dups are expected during rebalance. In 0.8, such dups are eliminated. Other than that, rebalance shouldn't cause dups since we commit consumed offsets to ZK before doing a

Re: Using Kafka for data messages

2013-06-14 Thread Josh Foure
Hi Mahendra, thanks for your reply.  I was planning on using the Atmosphere Framework (http://async-io.org/)  to handle the web push stuff (I've never used it before but we use PrimeFaces a little and that's what they use for their components).  I thought that I would have the JVM that the user