Another option is to either reduce the block sizes of the input data
or disabling the combine input format and splitting the data into more
files.


On Sat, Jun 23, 2012 at 5:58 PM, Yang <teddyyyy...@gmail.com> wrote:
> hi Sheng:
>
> I had exactly the same problem as you did.
>
> right now with hadoop 0.20 and above you can't do it anymore, because the
> new mapreduce.lib.input.FileInputFormat disabled the original
> mapred.map.tasks control to compute the goalSize in
> getSplits() method.  ---- the old mapred.FileInputFormat class had this
> control
>
> I submitted https://issues.apache.org/jira/browse/HADOOP-8503 to add back
> this control
>
>
> because pig actually compiles some hadoop classes into its own jar ,
> including this FileInputFormat class, you could actually work around this
> by patching your own hadoop jar, then build pig with this jar, and then use
> your re-built pig  in production. you need to make sure to use the full pig
> jar instead of the pig-withouthadoop.jar.
>
> you can also kind of achieve part of the same goal by setting
> mapreduces.max.split.size, but this is rather inflexible, and if your pig
> script generates several MR jobs, the same split size will hold for all the
> jobs, which may not be ideal, if one stage consumes a lot more input data
> than another.
>
>
> Yang
>
> On Sat, Jun 23, 2012 at 1:48 PM, Sheng Guo <enigma...@gmail.com> wrote:
>
>> Thanks for all your help.
>>
>> My pig script may have some cpu-intensive job like nlp processing, so it
>> would be helpful if I have multiple mappers running. Correct me if I am
>> wrong.
>> Thanks,
>>
>> Sheng
>>
>> On Sat, Jun 23, 2012 at 9:40 AM, Scott Foster <scottf.con...@gmail.com
>> >wrote:
>>
>> > You can also turn off split combination completely and then the number
>> > of mappers will equal the number of blocks
>> > SET pig.noSplitCombination false;
>> >
>> > Adding mappers may not make your process run faster since the time to
>> > read the data may be less than the overhead of creating a new JVM for
>> > each map task.
>> >
>> > scott.
>> >
>>

Reply via email to