Ec2 and MR Job question

2008-06-14 Thread Billy Pearson
I have a question someone may have answered here before but I can not find the answer. Assuming I have a cluster of servers hosting a large amount of data I want to run a large job that the maps take a lot of cpu power to run and the reduces only take a small amount cpu to run. I want to run th

Re: Ec2 and MR Job question

2008-06-14 Thread Chris K Wensel
well, to answer your last question first, just set the # reducers to zero. but you can't just run reducers without mappers (as far as I know, having never tried). so your local job will need to run identity mappers in order to feed your reducers. http://hadoop.apache.org/core/docs/r0.16.4/

Re: Ec2 and MR Job question

2008-06-14 Thread Billy Pearson
I understand how to run it as two jobs my only question is Is there away to make the mappers store the final output in hdfs? so I can kill the ec2 machines without waiting to the reduce stage ends! Billy "Chris K Wensel" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] well, to an

Re: Ec2 and MR Job question

2008-06-14 Thread Billy Pearson
My second question is about the ec2 machines has anyone solved the hostname problem in a automated way? Example if I launch a ec2 server to run a task tracker the hostname reported back to my local cluster with its internal address the local reduce task can not access the map files on the ec2 m

Re: Ec2 and MR Job question

2008-06-16 Thread Chanchal James
Hi Billy, when I tested Hadoop on an EC2 machine, I didnt come across the hostname problem.. Probably because I changed the hostname to a public FQDN. On Sat, Jun 14, 2008 at 10:09 PM, Billy Pearson <[EMAIL PROTECTED]> wrote: > My second question is about the ec2 machines has anyone solved the h