inputsplits.
> try to configure "mapred.min.split.size" to reduce the number of your
> mapper
> if you want to.
>
> And I don't know why your reducer is just one. Anyone knows?
>
> On Tue, Oct 7, 2008 at 9:06 AM, Andy Li <[EMAIL PROTECTED]> wrote:
>
>
Dears,
Sorry, I did not mean to cross post. But the previous article was
accidentally posted to the HBase user list. I would like to bring it back
to the Hadoop user since it is confusing me a lot and it is mainly MapReduce
related.
Currently running version hadoop-0.18.1 on 25 nodes. Map and
I think this is the same problem related to this mail thread.
http://www.mail-archive.com/[EMAIL PROTECTED]/msg02759.html
A JIRA has been filed, please see HADOOP-2915.
On Fri, Mar 14, 2008 at 2:08 AM, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
> Hi,
> any magic we can do with hadoop.dfs.umask
What about a hot standby namenode?
For write-ahead-log to avoid crash and recovery, I think this is fine for
small I/O.
For large volume, the write-ahead-log will actually take up the system IO
resource
pretty much that makes 2 IO per block (log and the actual data). This will
fall back
how curre
I have encountered the same problem when running the MapReduce code as a
different user name.
This issue was brought up in the core-dev mailing list, but I didn't see any
work around or solution.
Therefore, I would like to bring up this topic again to gain some input.
Sorry for cross posting, but
Try the 2 parameters to utilize all the cores per node/host.
mapred.tasktracker.map.tasks.maximum
7
The maximum number of map tasks that will be run
simultaneously by a task tracker.
mapred.tasktracker.reduce.tasks.maximum
7
The maximum number of reduce tasks that will be run
Shouldn't the official way to do this is to implement your own RecordWriter
and implement the
OutputFormatClass.
conf.setOutputFormat(yourClass);
Inside the yourClass, you can return your own RecordWriter class in the
getRecordWriter method.
I did it on the FileInputFormat with my own RecordRead
Thanks for both inputs. My question actually focus more on what Vivek has
mentioned.
I would like to work on the JobClient to see how it submits jobs to
different file system and
slaves in the same Hadoop cluster.
Not sure if there is a complete document to explain the scheduler underneath
Hadoo