Re: How the number of mapper tasks is calculated

2011-07-25 Thread Chiku Singh
What is your use case? Why would you only want to use only 5 mappers and not the whole 10 task trackers? "If an individual file is so large that it will affect seek time it will be split to several Splits" (http://wiki.apache.org/hadoop/HadoopMapReduce) "if a split span over more than one dfs blo

Re: MR 0.20.2 job chaining

2011-07-25 Thread Harsh J
What you may be looking for is a workflow system such as Oozie (yahoo.github.com/oozie/) or Azkaban (http://sna-projects.com/azkaban/). If your needs are simple (2-3 jobs, not too many conditions, etc. per workflow), you can checkout the JobControl API (http://hadoop.apache.org/common/docs/r0.20.2

How the number of mapper tasks is calculated

2011-07-25 Thread Anfernee Xu
I have a generic question about how the number of mapper tasks is calculated, as far as I know, the number is primarily based on the number of splits, say if I have 5 splits and I have 10 tasktracker running in the cluster, I will have 5 mapper tasks running in my MR job, right? But what I found i

MR 0.20.2 job chaining

2011-07-25 Thread Ross Nordeen
Hello all, I am trying to write a MR program where the output from the mappers are dependent on the previous map processes. I understand that a job scheduler exists to control such processes. Would anyone be able to give some sample code of a working implementation of this in hadoop 0.20.2?

Re: Job tracker error

2011-07-25 Thread Gagan Bansal
Thanks everyone. After setting the HADOOP_CLIENT_OPTS, the error changed to that the number of tasks my job was launching was more than 100,000 which I believe is the maximum set on my cluster. This was because I had more than 100,000 files input to my job. I merged some files so that the total nu

Re: NodeManager not able to connect to the ResourceManager (MRv2)

2011-07-25 Thread Ramya Sunil
Hi Praveen, Can you please look at the RM logs and check if there are any errors/exceptions. I ran into a similar issue when my RM was down. Also, the defaults are present at http://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/yarn/yarn-server/yarn-server-common/src/main/resou