Re: setNumTasks

2012-03-22 Thread Mohit Anchlia
Could someone please help me answer this question? On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.comwrote: What is the corresponding system property for setNumTasks? Can it be used explicitly as system property like mapred.tasks.?

Re: hadoop permission guideline

2012-03-22 Thread Suresh Srinivas
Can you please take this discussion CDH mailing list? On Mar 22, 2012, at 7:51 AM, Michael Wang michael.w...@meredith.com wrote: I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to install all needed packages. When it was installed, the root is used. I found the

Re: setNumTasks

2012-03-22 Thread Mohit Anchlia
Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's confusing as to what it's purpose is for? I tried setting it for my job still I see more map tasks running than *mapred.map.tasks* On Thu, Mar 22, 2012 at 7:53 AM, Harsh J ha...@cloudera.com wrote: There isn't such an API as

Re: setNumTasks

2012-03-22 Thread Bejoy Ks
Hi Mohit The number of map tasks is determined by your number of input splits and the Input Format used by your MR job. Setting this value won't help you control the same. AFAIK it would get effective if the value in mapred.map.tasks is greater than the no of tasks calculated by the Job

Re: hadoop permission guideline

2012-03-22 Thread Harsh J
Hi Michael, Am moving your question to the scm-us...@cloudera.org group which is home to the community of Cloudera Manager users. You will get better responses here. In case you wish to browse or subscribe to this group, visit https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users

Re: setNumTasks

2012-03-22 Thread Shi Yu
If you want to control the number of input splits at fine granularity, you could customize the NLineInputFormat. You need to determine the number of lines per each split. Thus you need to know before is the number of lines in your input data, for instance, using hadoop -text /input/dir/* |

Re: rack awareness and safemode

2012-03-22 Thread Patai Sangbutsarakum
I restarted the cluster yesterday with rack-awareness enable. Things went well. confirm that there was no issues at all. Thanks you all again. On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum silvianhad...@gmail.com wrote: Thanks you all. On Tue, Mar 20, 2012 at 2:44 PM, Harsh J

Re: rack awareness and safemode

2012-03-22 Thread John Meagher
Make sure you run hadoop fsck /. It should report a lot of blocks with the replication policy violated. In the sort term it isn't anything to worry about and everything will work fine even with those errors. Run the script I sent out earlier to fix those errors and bring everything into

Re: tasktracker/jobtracker.. expectation..

2012-03-22 Thread Bejoy Ks
Hi Patai JobTracker automatically handles this situation by attempting the task on different nodes.Could you verify the number of attempts that these failed tasks made. Was that just one? If more whether all the task attempts were triggered on the same node or not? Did all of them fail with

Re: Number of retries

2012-03-22 Thread Bejoy KS
Mohit If you are writing to a db from a job in an atomic way, this would pop up. You can avoid this only by disabling speculative execution. Drilling down from web UI to a task level would get you the tasks where multiple attempts were there. --Original Message-- From: Mohit

Re: Number of retries

2012-03-22 Thread Bejoy KS
Hi Mohit To add on, duplicates won't be there if your output is written to a hdfs file. Because if one attempt of a task is completed only that output file is copied to the final output destn and the files generated by other task attempts that are killed are just ignored. Regards Bejoy KS

number of partitions

2012-03-22 Thread Harun Raşit ER
I wrote a custom partitioner. But when I work as standalone or pseudo-distributed mode, the number of partitions is always 1. I set the numberOfReducer to 4, but the numOfPartitions parameter of custom partitioner is still 1 and all my four mappers' results are going to 1 reducer. The other

hadoop on cygwin : tasktrakker is throwing error : need helpv

2012-03-22 Thread Santosh Borse
I have installed hadoop on cygwin to help me to write MR code in windows eclipse. 2012-03-22 22:19:57,896 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-uygwin\mapred\local\ttprivate to 0700