Unless I'm missing something, it sounds like the OP wants to chain jobs where
the results from one job are the input to another...
Of course it's Sun morning and I haven't had my first cup of coffee so I could
be misinterpreting the OP's question.
If the OP wanted to send the data to each node
Praveenesh,
Well, It gives you more convenience :). If you have worked on R, then you
might notice with R you can write mapper as a lapply(using rmr). They have
already abstracted a lot of stuff for you so you have less control over
things. But still as far as convenience is concerned its damn
Yeah, but I am facing a weird situation, in which my Rhadoop job (using
rmr) is taking much more time than My Hadoop streaming job in R. So wanted
to see if others also faced same problem or did anyone did any performance
evalulation of Revolutions - rmr ?
Thanks,
Praveenesh
On Mon, Jan 30, 2012
Hi All,
I am new to Hadoop. Please let me know the details of Hardware required for
Hadoop cluster set up?
( Min 3 node cluster)
I would like to know the OS and Memory,CPU, network, storage details
required for this.
Thanks
MRK
--
View this message in context:
MRK,
Can you explain the types (and sizes) of problems you are hoping to solve?
If you are simply testing, you can use a laptop and two virtual machines.
I presume your looking for more though.
Kindest regards.
Ron
On Sun, Jan 29, 2012 at 10:00 PM, renuka renumetuk...@gmail.com wrote:
Hi
Hi Ron,
Thanks for the Reply. As of now we are learning and using hadoop to geneate
our internal application reports. So would like to know hardware requirement
for small cluster set up?
Thanks
MRK
--
View this message in context:
Renuka,
Hadoop poses very basic hardware requirements and can even run
clusters off regular PCs/Laptops you may already have lying around.
This is good enough to get started with for learning/testing purposes.
For deployments, the use-case would define your clusters' memory and
storage needs.
Hi guys !
I have run mumak with FIFO. It works fine.
I am trying to run jobtrace in test/data with capacity scheduler.
I have done :
1 Build contrib/capacity-scheduler
2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/
3set mapred.jobtracker.taskScheduler to
Hi guys !
I have run mumak with FIFO. It works fine.
I am trying to run jobtrace in test/data with capacity scheduler.
I have done :
1 Build contrib/capacity-scheduler
2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/
3set mapred.jobtracker.taskScheduler to
Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?
What I want to achieve is - If some job is running more than
_some_predefined_timeout_limit, it should be killed automatically.
Is it possible to achieve this, through shell scripts or any other way ?
You might want to take a look at the kill command : hadoop job -kill
jobid.
Prashant
On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar praveen...@gmail.comwrote:
Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?
What I want to achieve is - If
Yeah, I am aware of that, but it needs you to explicity monitor the job and
look for jobid and then hadoop job -kill command.
What I want to know - Is there anyway to do all this automatically by
providing some timer or something -- that if my job is taking more than
some predefined time, it would
Thanks Edward I'll do that.
On Fri, Jan 27, 2012 at 6:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Task tracker sometimes so not clean up their mapred temp directories well
if that is the case the tt on startup can spent many minutes deleting
files. I use find to delete files older then
In the current stables, this is available at the task level with a
default fo 10m of non-responsiveness per task. Controlled per-job via
mapred.task.timeout.
There is no built-in feature that lets you monitor and set a timeout
on the job execution itself, however (but should be easy to do) -- How
14 matches
Mail list logo