Great job. +1 Warm Regards, Tariq cloudfront.blogspot.com
On Wed, Aug 7, 2013 at 8:27 PM, Russell Jurney <russell.jur...@gmail.com>wrote: > Cool stuff, a Pig Kafka UDF. > > Russell Jurney http://datasyndrome.com > > Begin forwarded message: > > *From:* David Arthur <mum...@gmail.com> > *Date:* August 7, 2013, 7:41:30 AM PDT > *To:* us...@kafka.apache.org > *Subject:* *Reading Kafka directly from Pig?* > *Reply-To:* us...@kafka.apache.org > > I've thrown together a Pig LoadFunc to read data from Kafka, so you could > load data like: > > QUERY_LOGS = load 'kafka://localhost:9092/logs.query#8' using > com.mycompany.pig.KafkaAvroLoader('com.mycompany.Query'); > > The path part of the uri is the Kafka topic, and the fragment is the number > of partitions. In the implementation I have, it makes one input split per > partition. Offsets are not really dealt with at this point - it's a rough > prototype. > > Anyone have thoughts on whether or not this is a good idea? I know usually > the pattern is: kafka -> hdfs -> mapreduce. If I'm only reading from this > data from Kafka once, is there any reason why I can't skip writing to HDFS? > > Thanks! > -David >