Re: "Reduce input groups" vs "Reduce input records"

2011-03-25 Thread Todd Lipcon
Hi Pedro, Reduce Input Groups is the number of unique keys fed into the reducers. Reduce Input Records is the number of values. Each key has one or more values associated with it coming into the reducer. For example, with the canonical wordcount example, "reduce input groups" would be the total n

Re: map tasks vs launched map tasks

2011-03-25 Thread Allen Wittenauer
On Mar 25, 2011, at 10:09 AM, Pedro Costa wrote: > Hi, > > during the setup phase and the cleanup phase of the tasks, the Hadoop > MR uses map tasks to do it. These tasks appears in the counters shown > at the end of an example? > For example, the counter below shows that my example ran 9 map ta

Re: change number of slots in MR

2011-03-25 Thread David Rosenstrauch
On 03/25/2011 03:07 PM, Pedro Costa wrote: I don't know if this is what I want. I want to set the number of slots that are available for the map and the reduce tasks to run. I don't want to define the number of tasks. On Fri, Mar 25, 2011 at 6:44 PM, David Rosenstrauch wrote: On 03/25/2011 02:

Re: change number of slots in MR

2011-03-25 Thread Pedro Costa
I don't know if this is what I want. I want to set the number of slots that are available for the map and the reduce tasks to run. I don't want to define the number of tasks. On Fri, Mar 25, 2011 at 6:44 PM, David Rosenstrauch wrote: > On 03/25/2011 02:26 PM, Pedro Costa wrote: >> >> Hi, >> >> is

Re: change number of slots in MR

2011-03-25 Thread David Rosenstrauch
On 03/25/2011 02:26 PM, Pedro Costa wrote: Hi, is it possible to configure the total number of slots that a TaskTracker has, to run the map and reduce tasks? Thanks, Yes. See the mapred.map.tasks and mapred.reduce.tasks settings. HTH, DR

change number of slots in MR

2011-03-25 Thread Pedro Costa
Hi, is it possible to configure the total number of slots that a TaskTracker has, to run the map and reduce tasks? Thanks, -- Pedro

map tasks vs launched map tasks

2011-03-25 Thread Pedro Costa
Hi, during the setup phase and the cleanup phase of the tasks, the Hadoop MR uses map tasks to do it. These tasks appears in the counters shown at the end of an example? For example, the counter below shows that my example ran 9 map tasks and 2 reduce tasks, but the Launched map task has the value

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Allen Wittenauer
On Mar 24, 2011, at 10:09 AM, Jonathan Coveney wrote: > I am not sure if this is the right listserv, forgive me if it is not. A better choice would likely be hdfs-user@, since this is really about watching files in HDFS. > My > goal is this: monitor HDFS until a file is create, and th

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Mapred Learn
Does Oozie co-ordinator work ? Last time I tried it, it had lot of problems: i) job from start to end_timestamp were all being submitted at once not at actual wall clock time. ii) The links to all the jobs in a particular co-ordinator work-flow were not working i.e. you were not able to see the p

"Reduce input groups" vs "Reduce input records"

2011-03-25 Thread Pedro Costa
Hi, in this MR example, it exists the field "Reduce input groups" and "Reduce input records". What's the difference between these 2 fields? $ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount data/bible+shakes.nopunc wc 1 10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWor

Re: A way to monitor HDFS for a file to come live, and then kick off a job?

2011-03-25 Thread Bai, Gang
Hi Jon, Oozie could handle this nicely. You may just specify a Oozie coordinator jobs. But if you don't have a Oozie server handy, cron jobs could also meet your needs. Regards, -BaiGang On Fri, Mar 25, 2011 at 1:09 AM, Jonathan Coveney wrote: > I am not sure if this is the right listserv, forg

RandomWriter - less than 10 GB data

2011-03-25 Thread Robert Grandl
Hi all, I need to generate random data with random writer. Can somebody tell me how I can generate less than 10 GB of data in total ? Which options should I configure in the configuration file ? Many thanks in advance, Robert