Il 17/11/2011 05:00, He Chen ha scritto:
Hi Jay Vyas
Ke yuan's method may decrease the number of mapper because in default
the number of mapper for a job = the number of blocks in this job's input
file.
Hi,
I'm not in production phase, so I just reference things that I read.
First, may be
Hi guys : In a shared cluster environment, whats the best way to reduce the
number of mappers per job ? Should you do it with inputSplits ? Or simply
toggle the values in the JobConf (i.e. increase the number of bytes in an
input split) ?
--
Jay Vyas
MMSB/UCHC
just the blocksize 128M or 256M,it may reduce the number of mappers per job
2011/11/17 Jay Vyas jayunit...@gmail.com
Hi guys : In a shared cluster environment, whats the best way to reduce the
number of mappers per job ? Should you do it with inputSplits ? Or simply
toggle the values in the
yes ,you're right,but
1)waste of disk space ,this is not right,this will not waster the disk
space of datanode,if you don't believe ,you can see the code
2) difficulty to balance HDFS,this may be true
3) low Map stage data locality; why?
2011/11/17 He Chen airb...@gmail.com
Hi Jay Vyas
Ke