Re: Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Andy Li
inputsplits. > try to configure "mapred.min.split.size" to reduce the number of your > mapper > if you want to. > > And I don't know why your reducer is just one. Anyone knows? > > On Tue, Oct 7, 2008 at 9:06 AM, Andy Li <[EMAIL PROTECTED]> wrote: > >

Map and Reduce numbers are not restricted by setNumMapTasks and setNumReduceTasks, JobConf related?

2008-10-06 Thread Andy Li
Dears, Sorry, I did not mean to cross post. But the previous article was accidentally posted to the HBase user list. I would like to bring it back to the Hadoop user since it is confusing me a lot and it is mainly MapReduce related. Currently running version hadoop-0.18.1 on 25 nodes. Map and

Re: [some bugs] Re: file permission problem

2008-03-14 Thread Andy Li
I think this is the same problem related to this mail thread. http://www.mail-archive.com/[EMAIL PROTECTED]/msg02759.html A JIRA has been filed, please see HADOOP-2915. On Fri, Mar 14, 2008 at 2:08 AM, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Hi, > any magic we can do with hadoop.dfs.umask

Re: long write operations and data recovery

2008-02-29 Thread Andy Li
What about a hot standby namenode? For write-ahead-log to avoid crash and recovery, I think this is fine for small I/O. For large volume, the write-ahead-log will actually take up the system IO resource pretty much that makes 2 IO per block (log and the actual data). This will fall back how curre

MapReduce - JobClient create new folders with superuser permission and owner:group instead of the account that runs it

2008-02-28 Thread Andy Li
I have encountered the same problem when running the MapReduce code as a different user name. This issue was brought up in the core-dev mailing list, but I didn't see any work around or solution. Therefore, I would like to bring up this topic again to gain some input. Sorry for cross posting, but

Re: Questions regarding configuration parameters...

2008-02-21 Thread Andy Li
Try the 2 parameters to utilize all the cores per node/host. mapred.tasktracker.map.tasks.maximum 7 The maximum number of map tasks that will be run simultaneously by a task tracker. mapred.tasktracker.reduce.tasks.maximum 7 The maximum number of reduce tasks that will be run

Re: FileOutputFormat which does not write key value?

2008-02-19 Thread Andy Li
Shouldn't the official way to do this is to implement your own RecordWriter and implement the OutputFormatClass. conf.setOutputFormat(yourClass); Inside the yourClass, you can return your own RecordWriter class in the getRecordWriter method. I did it on the FileInputFormat with my own RecordRead

Re: Questions about the MapReduce libraries and job schedulers inside JobTracker and JobClient running on Hadoop

2008-02-15 Thread Andy Li
Thanks for both inputs. My question actually focus more on what Vivek has mentioned. I would like to work on the JobClient to see how it submits jobs to different file system and slaves in the same Hadoop cluster. Not sure if there is a complete document to explain the scheduler underneath Hadoo