Re: S3 Consumer

2012-12-28 Thread Chetan Conikee
Noticed this s3 based consumer project on github https://github.com/razvan/kafka-s3-consumer On Dec 27, 2012, at 7:08 AM, David Arthur wrote: > I don't think anything exists like this in Kafka (or contrib), but it would > be a useful addition! Personally, I have written this exact thing at p

Kafka talk and meetup in ApacheCon

2012-12-28 Thread Jun Rao
Hi, Everyone, Just want to let people know that there will be a Kafka presentation (focusing on replication) at ApacheCon in Feb 2013. http://na.apachecon.com/schedule/presentation/115/ We also plan to have a Kafka meetup. http://wiki.apache.org/apachecon/ApacheMeetupsNA13 Please sign up for Apa

Re: S3 Consumer

2012-12-28 Thread Russell Jurney
Would you please contribute this to open source? What you've written has been asked for many times. FWIW, I would immediately incorporate it into my book, Agile Data. Russell Jurney http://datasyndrome.com On Dec 28, 2012, at 8:06 AM, Liam Stewart wrote: > We have a tool that reads data continu

Re: anecdotal uptime and service monitoring

2012-12-28 Thread Jun Rao
At LinkedIn, the most common failure of a Kafka broker is when we have to deploy new Kafka code/config. Otherwise, the broker can be up for a long time (e..g, months). It woud be good to monitor the following metrics at the broker: log flush time/rate, produce/fetch requests/messages rate, GC rate/

Re: automatic producer balancing over brokers

2012-12-28 Thread Jun Rao
This is a known bug in Kafka 0.7.x. Basically, for a new topic, we bootstrap using all existing brokers. However, if a topic already exists on some brokers, we never bootstrap again, which means new brokers will be ignored. For now, you have to manually create the topic on the new brokers (e.g., by

Re: A "Java heap space" question

2012-12-28 Thread Jun Rao
Then, compression won't help. Try increasing the heap size. If that doesn't help, you may need to use more brokers. Thanks, Jun On Thu, Dec 27, 2012 at 10:26 PM, xingcan wrote: > Jun, > > Out messages are not plain text. Most of them are JPEG files. I'm not sure > if compression will be useful

Re: Kafka Node.js Integration Questions/Advice

2012-12-28 Thread Christopher Alexander
Update: Early (1 week) implementation of Node-Kafka has resulted in the following observations: 1. Consumer is unstable. 2. If use of Consumer is mandatory, create the Consumer in application-scope, not request-scope. 3. Attempt to close Consumer on application shutdown. Results of unplanned sh

Producer messages balance over servers

2012-12-28 Thread Samuel García Martínez
Hi! I'm playing with kafka using the following setup: 3 zk nodes ensemble 2 brokers: * num_partitions:3 * topic.partition.count.map=test-topic:5 My producer connects to brokers using zk.connect. When the producer sends messages to the "test-topic" topic, the partitions are created on both b

Re: S3 Consumer

2012-12-28 Thread Liam Stewart
We have a tool that reads data continuously from brokers and then writes files to S3. A MR job didn't make sense for us given our current size and volume. We have one instance running right now and could add more by if needed, adjusting which instance reads from which brokers/topics/... Unfortunate

Re: S3 Consumer

2012-12-28 Thread Pratyush Chandra
Hi Matthew, I may be doing something wrong. I cloned the code at https://github.com/apache/kafka/tree/trunk/contrib/hadoop-consumer I am running following : - ./run-class.sh kafka.etl.impl.DataGenerator test/test.properties which generates a /tmp/kafka/data/1.dat file containing Dump tcp://local

Re: S3 Consumer

2012-12-28 Thread Matthew Rathbone
So the hadoop consumer does use the latest offset, it reads it from the 'input' directory in the record reader. We have a heavily modified version of the hadoop consumer that reads / writes offsets to zookeeper [much like the scala consumers] and this works great. FWIW we also use the hadoop cons

Re: S3 Consumer

2012-12-28 Thread Pratyush Chandra
I went through the source code of Hadoop consumer in contrib. It doesn't seem to be using previous offset at all. Neither in Data Generator or in Map reduce stage. Before I go into the implementation, I can think of 2 ways : 1. A consumerconnector receiving all the messages continuously, and then

automatic producer balancing over brokers

2012-12-28 Thread Samuel García Martínez
Hi! I'm playing with kafka 0.7.2 using the following setup: 3 zk nodes ensemble 2 brokers: * num_partitions:3 * topic.partition.count.map=test-topic:5 My producer connects to brokers using "zk.connect" property. When the producer sends messages to the "test-topic" topic, the partitions are