Re: reducing mappers for a job

2011-11-17 Thread Paolo Rodeghiero
Il 17/11/2011 05:00, He Chen ha scritto: Hi Jay Vyas Ke yuan's method may decrease the number of mapper because in default the number of mapper for a job = the number of blocks in this job's input file. Hi, I'm not in production phase, so I just reference things that I read. First, may be

reducing mappers for a job

2011-11-16 Thread Jay Vyas
Hi guys : In a shared cluster environment, whats the best way to reduce the number of mappers per job ? Should you do it with inputSplits ? Or simply toggle the values in the JobConf (i.e. increase the number of bytes in an input split) ? -- Jay Vyas MMSB/UCHC

Re: reducing mappers for a job

2011-11-16 Thread ke yuan
just the blocksize 128M or 256M,it may reduce the number of mappers per job 2011/11/17 Jay Vyas jayunit...@gmail.com Hi guys : In a shared cluster environment, whats the best way to reduce the number of mappers per job ? Should you do it with inputSplits ? Or simply toggle the values in the

Re: reducing mappers for a job

2011-11-16 Thread ke yuan
yes ,you're right,but 1)waste of disk space ,this is not right,this will not waster the disk space of datanode,if you don't believe ,you can see the code 2) difficulty to balance HDFS,this may be true 3) low Map stage data locality; why? 2011/11/17 He Chen airb...@gmail.com Hi Jay Vyas Ke