Re: Send Data and Message to all nodes

2012-01-29 Thread Michael Segel
Unless I'm missing something, it sounds like the OP wants to chain jobs where the results from one job are the input to another... Of course it's Sun morning and I haven't had my first cup of coffee so I could be misinterpreting the OP's question. If the OP wanted to send the data to each node

Re: Any info on R+Hadoop

2012-01-29 Thread Prashant Sharma
Praveenesh, Well, It gives you more convenience :). If you have worked on R, then you might notice with R you can write mapper as a lapply(using rmr). They have already abstracted a lot of stuff for you so you have less control over things. But still as far as convenience is concerned its damn

Re: Any info on R+Hadoop

2012-01-29 Thread praveenesh kumar
Yeah, but I am facing a weird situation, in which my Rhadoop job (using rmr) is taking much more time than My Hadoop streaming job in R. So wanted to see if others also faced same problem or did anyone did any performance evalulation of Revolutions - rmr ? Thanks, Praveenesh On Mon, Jan 30, 2012

Regarding Hardware required for hadoop cluster

2012-01-29 Thread renuka
Hi All, I am new to Hadoop. Please let me know the details of Hardware required for Hadoop cluster set up? ( Min 3 node cluster) I would like to know the OS and Memory,CPU, network, storage details required for this. Thanks MRK -- View this message in context:

Re: Regarding Hardware required for hadoop cluster

2012-01-29 Thread Ronald Petty
MRK, Can you explain the types (and sizes) of problems you are hoping to solve? If you are simply testing, you can use a laptop and two virtual machines. I presume your looking for more though. Kindest regards. Ron On Sun, Jan 29, 2012 at 10:00 PM, renuka renumetuk...@gmail.com wrote: Hi

Re: Regarding Hardware required for hadoop cluster

2012-01-29 Thread renuka
Hi Ron, Thanks for the Reply. As of now we are learning and using hadoop to geneate our internal application reports. So would like to know hardware requirement for small cluster set up? Thanks MRK -- View this message in context:

Re: Regarding Hardware required for hadoop cluster

2012-01-29 Thread Harsh J
Renuka, Hadoop poses very basic hardware requirements and can even run clusters off regular PCs/Laptops you may already have lying around. This is good enough to get started with for learning/testing purposes. For deployments, the use-case would define your clusters' memory and storage needs.

Mumak with Capacity Scheduler : Submitting jobs to a particular queue

2012-01-29 Thread ArunKumar
Hi guys ! I have run mumak with FIFO. It works fine. I am trying to run jobtrace in test/data with capacity scheduler. I have done : 1 Build contrib/capacity-scheduler 2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/ 3set mapred.jobtracker.taskScheduler to

Mumak with Capacity Scheduler : Submitting jobs to a particular queue

2012-01-29 Thread ArunKumar
Hi guys ! I have run mumak with FIFO. It works fine. I am trying to run jobtrace in test/data with capacity scheduler. I have done : 1 Build contrib/capacity-scheduler 2Copied hadoop-*-capacity-jar from build/contrib/capacity_scheduler to lib/ 3set mapred.jobtracker.taskScheduler to

Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If some job is running more than _some_predefined_timeout_limit, it should be killed automatically. Is it possible to achieve this, through shell scripts or any other way ?

Re: Killing hadoop jobs automatically

2012-01-29 Thread Prashant Kommireddi
You might want to take a look at the kill command : hadoop job -kill jobid. Prashant On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar praveen...@gmail.comwrote: Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If

Re: Killing hadoop jobs automatically

2012-01-29 Thread praveenesh kumar
Yeah, I am aware of that, but it needs you to explicity monitor the job and look for jobid and then hadoop job -kill command. What I want to know - Is there anyway to do all this automatically by providing some timer or something -- that if my job is taking more than some predefined time, it would

Re: jobtracker url(Critical)

2012-01-29 Thread hadoop hive
Thanks Edward I'll do that. On Fri, Jan 27, 2012 at 6:36 PM, Edward Capriolo edlinuxg...@gmail.comwrote: Task tracker sometimes so not clean up their mapred temp directories well if that is the case the tt on startup can spent many minutes deleting files. I use find to delete files older then

Re: Killing hadoop jobs automatically

2012-01-29 Thread Harsh J
In the current stables, this is available at the task level with a default fo 10m of non-responsiveness per task. Controlled per-job via mapred.task.timeout. There is no built-in feature that lets you monitor and set a timeout on the job execution itself, however (but should be easy to do) -- How