Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
gt;> >> I will go through this.. >> >> >> >> Sent from my iPhone >> >> On May 25, 2011, at 7:51 AM, Juwei Shi wrote: >> >> >> >> The following are suitable for hadoop 0.20.2. >> >> >> >> 2011/5/25 Ju

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
I will go through this.. > >> > >> Sent from my iPhone > >> On May 25, 2011, at 7:51 AM, Juwei Shi wrote: > >> > >> The following are suitable for hadoop 0.20.2. > >> > >> 2011/5/25 Juwei Shi > >>> > >>> Th

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
;> The input split size is detemined by map.min.split.size, dfs.block.size >> and mapred.map.tasks. >> >> goalSize = totalSize / mapred.map.tasks >> minSize = max {mapred.min.split.size, minSplitSize} >> splitSize= max (minSize, min(goalSize, dfs.bloc

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
> > goalSize = totalSize / mapred.map.tasks > minSize = max {mapred.min.split.size, minSplitSize} > splitSize= max (minSize, min(goalSize, dfs.block.size)) > > minSplitSize is determined by each InputFormat such as > SequenceFileInputFormat. > > You may want

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Juwei Shi
The following are suitable for hadoop 0.20.2. 2011/5/25 Juwei Shi > The input split size is detemined by map.min.split.size, dfs.block.size and > mapred.map.tasks. > > goalSize = totalSize / mapred.map.tasks > minSize = max {mapred.min.split.size, minSplitSize} > splitSize=

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Juwei Shi
The input split size is detemined by map.min.split.size, dfs.block.size and mapred.map.tasks. goalSize = totalSize / mapred.map.tasks minSize = max {mapred.min.split.size, minSplitSize} splitSize= max (minSize, min(goalSize, dfs.block.size)) minSplitSize is determined by each InputFormat such as

Re: how to use mapred.min.split.size option ?

2011-05-25 Thread Mapred Learn
Resending > > Hi, > I have few input splits that are few MB in size. > I want to submit 1 GB of input to every mapper. Does anyone know how can I do > it ? > Currently each mapper gets one input split that results in many small > map-output files. > > I tried setting -Dmapred.map.min.spli

how to use mapred.min.split.size option ?

2011-05-24 Thread Mapred Learn
Hi, I have few input splits that are few MB in size. I want to submit 1 GB of input to every mapper. How can I do it ? Currently each mapper gets one input split that results in many small map-output files. I tried setting -Dmapred.map.min.split.size= , but still it does not take effect. Thanks,

Re: mapred.min.split.size

2011-03-18 Thread Pedro Costa
As I understand, mapred.min.split.size defines the minimum size of a split. In the case below: (1) HDFS block size = 32MB, mapred.min.split.size=64MB (mapred.min.split.size can be only set to larger than HDFS block size) when I run mapreduce, it means that a map will run one input split of 64MB

Re: mapred.min.split.size

2011-03-18 Thread Marcos Ortiz
El 3/18/2011 3:54 PM, Pedro Costa escribió: Hi What's the purpose of the parameter "mapred.min.split.size"? Thanks, There are many parameters that control the number of map tasks for a Job, and mapred.min.split.size controls the minimun size of a split. Other

Re: mapred.min.split.size

2011-03-18 Thread Ted Yu
e of the parameter "mapred.min.split.size"? > > Thanks, > -- > Pedro >

mapred.min.split.size

2011-03-18 Thread Pedro Costa
Hi What's the purpose of the parameter "mapred.min.split.size"? Thanks, -- Pedro