Could someone please help me answer this question?
On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
What is the corresponding system property for setNumTasks? Can it be used
explicitly as system property like mapred.tasks.?
Can you please take this discussion CDH mailing list?
On Mar 22, 2012, at 7:51 AM, Michael Wang michael.w...@meredith.com wrote:
I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to
install all needed packages. When it was installed, the root is used. I
found the
Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's
confusing as to what it's purpose is for? I tried setting it for my job
still I see more map tasks running than *mapred.map.tasks*
On Thu, Mar 22, 2012 at 7:53 AM, Harsh J ha...@cloudera.com wrote:
There isn't such an API as
Hi Mohit
The number of map tasks is determined by your number of input splits
and the Input Format used by your MR job. Setting this value won't help you
control the same. AFAIK it would get effective if the value in
mapred.map.tasks is greater than the no of tasks calculated by the Job
Hi Michael,
Am moving your question to the scm-us...@cloudera.org group which is
home to the community of Cloudera Manager users. You will get better
responses here.
In case you wish to browse or subscribe to this group, visit
https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users
If you want to control the number of input splits at fine granularity,
you could customize the NLineInputFormat. You need to determine the
number of lines per each split. Thus you need to know before is the
number of lines in your input data, for instance, using
hadoop -text /input/dir/* |
I restarted the cluster yesterday with rack-awareness enable.
Things went well. confirm that there was no issues at all.
Thanks you all again.
On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum
silvianhad...@gmail.com wrote:
Thanks you all.
On Tue, Mar 20, 2012 at 2:44 PM, Harsh J
Make sure you run hadoop fsck /. It should report a lot of blocks
with the replication policy violated. In the sort term it isn't
anything to worry about and everything will work fine even with those
errors. Run the script I sent out earlier to fix those errors and
bring everything into
Hi Patai
JobTracker automatically handles this situation by attempting the task
on different nodes.Could you verify the number of attempts that these
failed tasks made. Was that just one? If more whether all the
task attempts were triggered on the same node or not? Did all of them fail
with
Mohit
If you are writing to a db from a job in an atomic way, this would pop
up. You can avoid this only by disabling speculative execution.
Drilling down from web UI to a task level would get you the tasks where
multiple attempts were there.
--Original Message--
From: Mohit
Hi Mohit
To add on, duplicates won't be there if your output is written to a hdfs
file. Because if one attempt of a task is completed only that output file is
copied to the final output destn and the files generated by other task attempts
that are killed are just ignored.
Regards
Bejoy KS
I wrote a custom partitioner. But when I work as standalone or
pseudo-distributed mode, the number of partitions is always 1. I set the
numberOfReducer to 4, but the numOfPartitions parameter of custom
partitioner is still 1 and all my four mappers' results are going to 1
reducer. The other
I have installed hadoop on cygwin to help me to write MR code in windows
eclipse.
2012-03-22 22:19:57,896 ERROR org.apache.hadoop.mapred.TaskTracker: Can not
start task tracker because java.io.IOException: Failed to set permissions of
path: \tmp\hadoop-uygwin\mapred\local\ttprivate to 0700
13 matches
Mail list logo