Re: Re: Re: how to make hadoop balance automatically

2010-10-08 Thread shangan
a server not running a datanode,it can only be namenode or jobtracker. the copier jobs run on such server I think can bring uncertain risks. I find something in the book hadoop definitive guide When copying data into HDFS, it’s important to consider cluster balance. HDFS works best when the

Re: how to set diffent VM parameters for mappers and reducers?

2010-10-08 Thread Vitaliy Semochkin
You can set mapred.child.java.opts in mapred-site.xml it affects both Mappers and Reducers, while I need to specify jvm options for mappers and reducers separately. Regards, Vitaliy S On Tue, Oct 5, 2010 at 5:14 PM, Jeff Zhang zjf...@gmail.com wrote: You can set mapred.child.java.opts in

Re: nodes with different memory sizes

2010-10-08 Thread Pablo Cingolani
I think you can change that in your conf/mapred-site.xml, since it's a site specific config file (see: http://hadoop.apache.org/common/docs/current/cluster_setup.html) e.g.: property namemapred.child.java.opts/namevalue-Xmx8G/value /property I hope this helps Yours Pablo Cingolani On

What is the best to terminate a Map job without it being retried

2010-10-08 Thread Steve Kuo
I have a collection of dirty data files, which I can detect during the setup() phase of my Map job. It would be best that I can quit the map job and prevent it from being retried again. What is the best practice to do this? Thanks in advance.

Re: What is the best to terminate a Map job without it being retried

2010-10-08 Thread Ted Yu
How about deleting/moving the dirty files in your mapper or in another job ? On Fri, Oct 8, 2010 at 4:30 PM, Steve Kuo kuosen...@gmail.com wrote: I have a collection of dirty data files, which I can detect during the setup() phase of my Map job. It would be best that I can quit the map job