Re: Kafka + Maven

2012-11-20 Thread Jason Rosenberg
Hmmm..none of the ones there seem like the canonical version, how do I know which of the ones published there is the one to use? (I searched for 'kafka' on there...). Jason On Tue, Nov 20, 2012 at 10:29 PM, Pierre-Yves Ritschard wrote: > For what it's worth, I also publish releases on cloj

Re: Kafka + Maven

2012-11-20 Thread Pierre-Yves Ritschard
For what it's worth, I also publish releases on clojars.org On Wed, Nov 21, 2012 at 7:23 AM, Jason Rosenberg wrote: > +100 > I've been manually creating poms and uploading jars to our nexus repo too, > not ideal at all > > On Tue, Nov 20, 2012 at 6:48 PM, Otis Gospodnetic < > otis_gospodn

Re: Kafka + Maven

2012-11-20 Thread Jason Rosenberg
+100 I've been manually creating poms and uploading jars to our nexus repo too, not ideal at all On Tue, Nov 20, 2012 at 6:48 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Eh, correction: I see KAFKA-133 is actually *not* marked for 0.8 release - > it's just marked as affect

Re: Thread Safety of KafkaStreams

2012-11-20 Thread Neha Narkhede
David, One KafkaStream is meant to be iterated by a single thread. A better approach is to request higher number of streams from the Kafka consumer and let each process have its own KafkaStream. Thanks, Neha On Tue, Nov 20, 2012 at 9:40 PM, David Ross wrote: > Hello, > > We want to process mess

Thread Safety of KafkaStreams

2012-11-20 Thread David Ross
Hello, We want to process messages from a single KafkaStream in a number of processes. Is it possible to have this code executing in multiple threads against the same stream? for (message <- stream) { someBlockingOperation(message) } The scaladocs mention thread safety, but some of the code se

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Jun Rao
The attribute getCurrentOffset gives the log end offset. It's not necessarily the log size though since older segments could be deleted. Thanks, Jun On Tue, Nov 20, 2012 at 1:12 PM, Mike Heffner wrote: > Jun, > > Do you have any idea on what the JMX attribute values on the beans " > kafka:type

Re: accessing stats programmatically (instead of via jmx)

2012-11-20 Thread Jason Rosenberg
Nice, ok, I need to start using 0.8 (is there a semi-stable revision to start playing with?). On Tue, Nov 20, 2012 at 2:45 PM, Jay Kreps wrote: > In 0.7 there is no other way to access stats remotely. Technically the JMX > is accessible so you can certainly start the broker yourself >new Kaf

Re: Kafka + Maven

2012-11-20 Thread Otis Gospodnetic
Eh, correction: I see KAFKA-133 is actually *not* marked for 0.8 release - it's just marked as affecting the 0.8 release. :( Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  > > From: Otis Gospodnetic >To: "kafka-us

Re: Kafka + Maven

2012-11-20 Thread Otis Gospodnetic
Pretty pretty pretty please please please from us at Sematext, too.  I provided the instructions in KAFKA-133: https://issues.apache.org/jira/browse/KAFKA-133?focusedCommentId=13500822&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13500822 I've also pinged zkclie

Re: Newbie question

2012-11-20 Thread David Arthur
I believe the (bare minimum) runtime deps are: kafka, scala-library, zookeeper, and zkclient. Also snappy if you want snappy support. HTH, David On Nov 20, 2012, at 5:44 PM, Jamie Wang wrote: > Hi, > > I am new to using Kafka. I read all the documentations and followed the > quickstart steps

Re: Kafka + Maven

2012-11-20 Thread David Arthur
+1 We have kafka + deps defined in a custom Ivy repo On Nov 20, 2012, at 7:18 PM, Matthew Rathbone wrote: > ++ to both Maven packages and multiple Scala versions. > > As above, we host our own 2.9.2 build in Nexus. Seems crazy everyone is > doing the same thing and constantly repeating work. >

Re: Kafka + Maven

2012-11-20 Thread Brian O'Neill
+1, pretty please, please, please. (we also use Storm, and would love to see published artifacts) We use sonatype to publish our open source artifacts: https://docs.sonatype.org/display/Repository/Sonatype+OSS+Maven+Repository+Usage+Guide Its fairly straightforward. I can help out if you need it

Re: Kafka + Maven

2012-11-20 Thread Matthew Rathbone
++ to both Maven packages and multiple Scala versions. As above, we host our own 2.9.2 build in Nexus. Seems crazy everyone is doing the same thing and constantly repeating work. On Tue, Nov 20, 2012 at 5:42 PM, Roman Garcia wrote: > +1 > We also host our packages (kafka-scala28 and kafka-scala

Re: Kafka + Maven

2012-11-20 Thread Roman Garcia
+1 We also host our packages (kafka-scala28 and kafka-scala292) on our Nexus server Multiple Scala versions support would be nice as well. On Tue, Nov 20, 2012 at 9:36 PM, Evan Chan wrote: > +1. > We hosted our own built version of Kafka on our Nexus server as well. > > -Evan > > > On Tue, Nov

Re: Kafka + Maven

2012-11-20 Thread Evan Chan
+1. We hosted our own built version of Kafka on our Nexus server as well. -Evan On Tue, Nov 20, 2012 at 3:27 PM, Chris Riccomini wrote: > Hey Guys, > > I was talking with Jay, and he recommended I forward some feedback along. > > I have been playing with Kafka 0.8 this week, and am feeling the

Kafka + Maven

2012-11-20 Thread Chris Riccomini
Hey Guys, I was talking with Jay, and he recommended I forward some feedback along. I have been playing with Kafka 0.8 this week, and am feeling the pain in the lack of Maven support for it. Specifically, it'd be nice if this stuff were: 1. In Apache's SNAPSHOT repository 2. In some relea

Re: accessing stats programmatically (instead of via jmx)

2012-11-20 Thread Jay Kreps
In 0.7 there is no other way to access stats remotely. Technically the JMX is accessible so you can certainly start the broker yourself new KafkaServer(...) and just add a wrapper that calls the methods you are interested in, but if you are doing this from java it may be a bit awkward to reach i

Newbie question

2012-11-20 Thread Jamie Wang
Hi, I am new to using Kafka. I read all the documentations and followed the quickstart steps. I was able to run the sample kafka system. Looking through the Kafka directories extracted from the tar file, there are a lot of sub directories. I am wondering if they are all really needed to run ka

accessing stats programmatically (instead of via jmx)

2012-11-20 Thread Jason Rosenberg
Hi, I would like to expose some of the kafka stats that appear in the current kafka jmx mbeans. In our system we are using the yammer metrics library (instead of polling jmx), so I'd like to wrap the stats and expose them as yammer metrics elements, etc. Looking at the code, it doesn't seem easy

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Mike Heffner
Jun, Do you have any idea on what the JMX attribute values on the beans " kafka:type=kafka.logs.{topic name}-{partition idx}" represent then? It seems like these should correctly represent the current offsets of the producer logs? They appeared to track correctly for a while, but once the log size

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Mike Heffner
Evan, That's correct. The Storm ZK consumer path for us is: /{prefix}/{spout name}/10.x.x.x:9092:{partition} and is a JSON blob. ConsumerOffsetChecker would then not work for this. Mike On Tue, Nov 20, 2012 at 12:11 PM, Evan Chan wrote: > Mike, > > I'm not sure the Storm-bundled kafka store

Re: Kafka Broker Configuration Tuning and Repartitioning topic

2012-11-20 Thread Jay Kreps
I think this may be a terminology issue. By "re-partitioning" I think Neha means taking data currently on disk and splitting it into a different number of partitions on different servers. We can't really do this because the partition function is something computed on the client. A different issue

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Neha Narkhede
Docs are not updated since 0.8 is not yet released. Thanks, Neha On Tue, Nov 20, 2012 at 11:09 AM, Jason Rosenberg wrote: > Is there a configuration doc page for 0.8 (since apparently there are some > new settings)? > > Jason > > On Tue, Nov 20, 2012 at 10:39 AM, Jun Rao wrote: > >> That's righ

Re: Kafka Broker Configuration Tuning and Repartitioning topic

2012-11-20 Thread Muthukumar
Hi Neha, Thanks for the response, and we're currently working to integrate with mbeans exposed with collectors and monitor it. It will be great to know if we've not having support of repartition, can we move the files in one partition to another to pick-up? Will that work. Noads-8: total 9886868

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jason Rosenberg
Is there a configuration doc page for 0.8 (since apparently there are some new settings)? Jason On Tue, Nov 20, 2012 at 10:39 AM, Jun Rao wrote: > That's right. VIP is only used for getting metadata. All producer send > requests are through direct RPC to each broker. > > Thanks, > > Jun > > On

Re: some healthy broker disappear from zookeeper

2012-11-20 Thread Neha Narkhede
zookeeper server version is 3.3.3 is pretty buggy and has known session expiration and unexpected ephemeral node deletion bugs. Please upgrade to 3.3.4 and retry. Thanks, Neha On Tue, Nov 20, 2012 at 10:42 AM, Xiaoyu Wang wrote: > Hello everybody, > > We have run into this problem a few times in

some healthy broker disappear from zookeeper

2012-11-20 Thread Xiaoyu Wang
Hello everybody, We have run into this problem a few times in the past week. The symptom is some broker disappear from zookeeper. The broker appears to be healthy. After that, producers start producing lots of ZK producer cache stale log and stop making any progress. "logger.info("Try #" + numRet

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Jun Rao
You can try to put all brokers in a vip and expose the vip to the producer. If there is no vip, it takes the same amount effort as moving a zk cluster to a new set of hosts. Thanks, Jun On Tue, Nov 20, 2012 at 10:20 AM, David Arthur wrote: > If I understand correctly, the brokers stay informed

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jun Rao
That's right. VIP is only used for getting metadata. All producer send requests are through direct RPC to each broker. Thanks, Jun On Tue, Nov 20, 2012 at 10:28 AM, Jason Rosenberg wrote: > Ok, > > I think I understand (so I'll need to change some things in our set up to > work with 0.8). > >

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Neha Narkhede
>> So the VIP is only for getting meta-data? After that, under the covers, the producers will make direct connections to individual kafka hosts that they learned about from connecting through the VIP That's right. Thanks for your questions ! On Tue, Nov 20, 2012 at 10:28 AM, Jason Rosenberg wr

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jason Rosenberg
Ok, I think I understand (so I'll need to change some things in our set up to work with 0.8). So the VIP is only for getting meta-data? After that, under the covers, the producers will make direct connections to individual kafka hosts that they learned about from connecting through the VIP? Jas

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread David Arthur
If I understand correctly, the brokers stay informed about one another through ZooKeeper and therefor any broker can give info about any other broker? This is an interesting approach. What would happen if your broker list changed dramatically over time? On Nov 20, 2012, at 1:02 PM, Neha Narkhe

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jay Kreps
I think the confusion is that we are answering a slightly different question then what you are asking. If I understand you are asking, "do I need to put ALL the kafka broker urls into the config for the client and will this need to be updated if I add machines to the cluster?". The answer to both

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jason Rosenberg
On Tue, Nov 20, 2012 at 10:00 AM, Neha Narkhede wrote: > > By requiring use of a configured broker.list for each client, means that > > 1000's of deployed apps need to be updated any time the kafka cluster > > changes, no? (Or am I not understanding?). > > The advantage is that you can configure

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Neha Narkhede
This is being discussed in another thread - http://markmail.org/message/mypnt7sgkqt55jb2?q=Jason+async+producer Basically, you want zookeeper on the producer to do just one thing - notify the change in the liveness of brokers in Kafka cluster. In 0.8, brokers are not the entity to worry about, wha

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Neha Narkhede
> By requiring use of a configured broker.list for each client, means that > 1000's of deployed apps need to be updated any time the kafka cluster > changes, no? (Or am I not understanding?). The advantage is that you can configure broker.list to point to a VIP, so you can transparently change th

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Bae, Jae Hyeon
In the case that producer does not require zk.connect, how can the producer recognize the new brokers or brokers which went down? On Tue, Nov 20, 2012 at 8:31 AM, Jun Rao wrote: > David, > > The change in 0.8 is that instead of requiring zk.connect, we require > broker.list. In both cases, you ty

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jason Rosenberg
Ok, So, I'm still wrapping my mind around this. I liked being able to use zk for all clients, since it made it very easy to think about how to update the kafka cluster. E.g. how to add new brokers, how to move them all to new hosts entirely, etc., without having to redeploy all the clients. The

Re: createMessageStreamsByFilter unexpected behaviour

2012-11-20 Thread Jun Rao
This is likely caused by https://issues.apache.org/jira/browse/KAFKA-550. The fix has been checked into trunk. Thanks, Jun On Tue, Nov 20, 2012 at 4:44 AM, Michal Haris wrote: > Hi, I am seeing behaviour which I am not expecting when using topic > filters. > > TopicFilter sourceTopicFilter = ne

Re: Kafka on EC2

2012-11-20 Thread Evan Chan
We use m1.large's with ephemeral storage and get 20MB/sec using Kafka's built in benchmarking tool. No compression. On Tue, Nov 20, 2012 at 7:52 AM, David Arthur wrote: > In my experience, anything smaller than m1.xlarge isn't really suitable > for I/O intensive high performance stuff. I would

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Jun Rao
The tool gets the end offset of the log using getOffsetBefore and the consumer offset from ZK. It then calculates the lag. We do have a JMX for lag in ZookeeperConsumerConnector. The api is the following, but you need to provide topic/brokerid/partitionid. /** * JMX interface for monitoring con

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Evan Chan
Mike, I'm not sure the Storm-bundled kafka stores offsets in the same ZK locations as the regular Kafka consumer. Actually if you can verify the location that would be great, cuz I'm curious. Anyways the ConsumerOffsetChecker would not be able to help if the ZK locations were different. -Ev

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jun Rao
Jason, Auto discovery of new brokers and rolling restart of brokers are still supported in 0.8. It's just that most of the ZK related logic is moved to the broker. There are 2 reasons why we want to remove zkclient from the client. 1. If the client goes to GC, it can cause zk session expiration

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Neha Narkhede
Trunk does not have latest 0.8 code yet. We plan to merge 0.8 back into trunk soon, but it hasn't happened yet Typically, the number of producers to a production Kafka clusters is very large, which means large number of connections to zookeeper. If there is a slight blip on the zookeeper cluster d

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Jun Rao
David, The change in 0.8 is that instead of requiring zk.connect, we require broker.list. In both cases, you typically provide a list of hosts and ports. Functionality wise, they achieve the same thing, ie, the producer is able to send the data to the right broker. Are you saying that zk.connect i

Re: Kafka Broker Configuration Tuning and Repartitioning topic

2012-11-20 Thread Neha Narkhede
Muthu, a) Not as of now. Please feel free to create the JIRA and specify the details there b) I doubt increasing partitions will help. 500 GB/day/topic suggests the data per partition is only 10 GB/day. Before thinking about increasing the # of partitions, I would try a few things- 1. Inspect th

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Jun Rao
In 0.8, both the broker and the consumer still need zkclient. So, a zk cluster is still needed. Thanks, Jun On Tue, Nov 20, 2012 at 8:04 AM, Jason Rosenberg wrote: > Agreed, I'm not sure I understand the move away from zk. Is it still > required for consumers, and for the brokers themselves?

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread Jason Rosenberg
Agreed, I'm not sure I understand the move away from zk. Is it still required for consumers, and for the brokers themselves? If so, we still need to deploy a zk cluster anyway. Will kafka now support coordinating 1000's of producer clients? Jason On Tue, Nov 20, 2012 at 7:54 AM, David Arthur

Re: Understanding how to monitor using JMX attributes

2012-11-20 Thread Mike Heffner
I have not tried that yet, I was hoping to use an existing Ruby monitoring process that we use to monitor several other existing resources. I also don't want to make changes to the Kafka consumer code, as it's part of a bundled package (Storm). Where does ConsumerOffsetChecker pull its informatio

Re: async producer behavior if zk and/or kafka cluster goes away...

2012-11-20 Thread Jason Rosenberg
I checked out trunk. I guess I assumed that included the latest 0.8. Is that not right? Am I just looking at 0.7.x+? Honestly, I don't think it would be a positive thing not to be able to rely on zookeeper in producer code. How does that affect the discovery of a kafka cluster under dynamic co

Re: Suggestion on ZkClient usage in Kafka

2012-11-20 Thread David Arthur
On Nov 20, 2012, at 12:23 AM, Jun Rao wrote: > Jason, > > In 0.8, producer doesn't use zkclient at all. You just need to set > broker.list. This seems like a regression in functionality. For me, one of the benefits of Kafka is only needing to know about ZooKeeper > A number of things have cha

Re: Kafka on EC2

2012-11-20 Thread David Arthur
In my experience, anything smaller than m1.xlarge isn't really suitable for I/O intensive high performance stuff. I would guess that, for Kafka, a single m1.xlarge would outperform two m1.large. I have no hard evidence to support this however. What I'd like to see are some benchmarks comparing

createMessageStreamsByFilter unexpected behaviour

2012-11-20 Thread Michal Haris
Hi, I am seeing behaviour which I am not expecting when using topic filters. TopicFilter sourceTopicFilter = new Whitelist("pageviews"); List> streams = consumer.createMessageStreamsByFilter(sourceTopicFilter, 3); The topic has exactly 3 partitions and 3 streams are created, however only the last

Re: Kafka Broker Configuration Tuning and Repartitioning topic

2012-11-20 Thread Muthukumar
Hi Jun, Thanks for the response. a) Is there any plan in the roadmap to address this re-partition or partition balance with new partitions? Please let me know to have the JIRA for this. b) Do we need to go for more partitions for the topic6 (46 to ??) to reduce the new requests + backlog. -Muth