Saumitra,
Two questions come to mind that could help you narrow down a solution:
1) How quickly do the downstream processes need the transformed data?
Reason: If you can delay the processing for a period of time, enough to
batch the data into a blob that is a multiple of your block
saumitra.offic...@gmail.com
To: common-user@hadoop.apache.org
Sent: Saturday, June 25, 2011 1:05 PM
Subject: Re: Queue support from HDFS
Thanks for reply Jakob,
As far as I understand, Kafka's hadoop consumers is MR job where mappers
read from shared queue from Kafka and dump data to HDFS
Thanks for reply Jakob,
As far as I understand, Kafka's hadoop consumers is MR job where mappers
read from shared queue from Kafka and dump data to HDFS, but they are
not dynamically created as queue elements start bursting up.
Is there way so that new mappers are created when input queue of
Not directly, but you may wish to take a look at the Kafka project
(http://sna-projects.com/kafka/), which we use as a queue and then
bring the data periodically into HDFS via an MR job. See this
presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
-Jakob
On Fri, Jun