Queues in the capacity scheduler are logical data structures into which
MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
framework, according to some capacity constraints that can be defined for a
queue.
So, given your use case, I don't think Capacity Scheduler is going to
There is also kafka. http://kafka.apache.org
A high-throughput, distributed, publish-subscribe messaging system.
But it does not push into HDFS, you need to launch a job to pull data in.
Regards
Bertrand
On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf mirko.kae...@gmail.com wrote:
I would
He's got two different queues.
1) queue in capacity scheduler so he can have a set or M/R tasks running in the
background to pull data off of...
2) a durable queue that receives the inbound json files to be processed.
You can have a customer written listener that pulls data from the queue
Hello,
I have a hadoop cluster setup of 10 nodes and I an in need of implementing
queues in the cluster for receiving high volumes of data.
Please suggest what will be more efficient to use in the case of receiving
24 Million Json files.. approx 5 KB each in every 24 hours :
1. Using Capacity
Have you looked at flume?
Sent from my iPhone
On Jan 10, 2013, at 7:12 PM, Panshul Whisper ouchwhis...@gmail.com wrote:
Hello,
I have a hadoop cluster setup of 10 nodes and I an in need of implementing
queues in the cluster for receiving high volumes of data.
Please suggest what will be
The attached screenshot will shows how flume will work, and also you can
consider RabbitMQ, as it persistent too..
∞
Shashwat Shriparv
On Fri, Jan 11, 2013 at 10:24 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
Have you looked at flume?
Sent from my iPhone
On Jan 10, 2013, at 7:12 PM,
Your question in unclear: HDFS has no queues for ingesting data (it is
a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
components have queues for processing data purposes.
On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper ouchwhis...@gmail.com wrote:
Hello,
I have a hadoop