The script you posted wouldn't have any reducers, so it wouldn't matter. It's a map only job.
2013/3/15 <tsuna...@o2.pl> > Dear Apache Pig Users, > > It is easy to control a number of reducers in JOIN, GROUP, COGROUP, > etc. statements by a general "set default_parallel $NUM" command or > "parallel $NUM" info in the end of line. > > However, I am interested in controlling number of reducers in a > foreach statement. > The case is as follows: > * on CDH 4.0.1. with Pig 0.9.2. > * read one sequence file (of many equivalent files) of about 400GB, > * proceed each element in UDF __using as many reducers as possible__ > * store the results > > Apache Pig script implementing this case -- which gives __only one__ > reducer -- is below: > ------------------------------------------------ > SET default_parallel 16; > REGISTER myjar.jar; > input_pairs = LOAD '$input' USING > pl.example.MySequenceFileLoader('org.apache.hadoop.io.BytesWritable', > 'org.apache.hadoop.io.BytesWritable') as (key:chararray, > value:bytearray); > input_protos = FOREACH input_pairs GENERATE > FLATTEN(pl.example.ReadProtobuf(value)); > output_protos = FOREACH input_protos GENERATE > FLATTEN(pl.example.XMLGenerator(*)); > STORE output_protos INTO '$output' USING PigStorage(); > ------------------------------------------------ > > As far as I know "set mapred.reduce.tasks 5" can only limit a max > number of reducers > > Could you give me some advice? Am I missing something? >