Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

최종원 Fri, 07 Feb 2014 01:36:55 -0800

Finally, I solved the problem. thank you.

It fixed in 0.12.1 version.


I've downloaded the source code from github.
and change the pig version, and make build the 0.12.1 vesion with h2 option.

It calculate the input source files size, and make multi reduce tasks...

Thank you very much... happy weekend ~ bye




2014-02-07 15:48 GMT+09:00 최종원 <jongwons.c...@gmail.com>:

> Thank you for your answer.
>
> But where can I find pig source of pig-0.12.0-h2 version ?
> I think, there must be difference between pig-0.12.0 and pig-0.12.0-h2.
>
> but I cannot find the source version 0.12.0-h2.
>
> when I extract the jar file, there are additional package (like
> org.apache.pig.backend.hadoop23.PigJobControl ...).
>
>
>
>
>
>
>
> 2014-02-07 10:59 GMT+09:00 Cheolsoo Park <piaozhe...@gmail.com>:
>
> Hi,
>>
>> Sounds like you're bitten by PIG-3512-
>> https://issues.apache.org/jira/browse/PIG-3512
>>
>> Can you try to apply the patch and rebuild the jar?
>>
>> Thanks,
>> Cheolsoo
>>
>>
>>
>> On Thu, Feb 6, 2014 at 7:27 PM, 최종원 <jongwons.c...@gmail.com> wrote:
>>
>> > This is the log ...
>> >
>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Reduce phase detected, estimating # of required reducers.
>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Using reducer estimator:
>> >
>> >
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
>> > - BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize
>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Could not estimate number of reducers and no requested or default
>> > parallelism set. Defaulting to 1 reducer.
>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>> > - Setting Parallelism to 1
>> > 2014-02-06 17:29:19,104 [Thread-42] INFO
>> >
>> >
>>  
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>> > - 1 map-reduce job(s) waiting for submission.
>> >
>> > InputSizeReducerEstimator cannot calculate map files size, so doesn't
>> > estimate reducer size.
>> > But I think, I gave the right hadoop file path.
>> > I tried many possible pathes like...
>> >
>> >   relative-path/to/file
>> >   /user/myuser/absolute-path/to/file
>> >   hdfs://host:8020/user/myuser/absolute-path/to/file
>> >
>> hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port
>> >
>> > etc...
>> >
>> > but the pig failed to estimate reducer size.
>> >
>> > I am almost defeated... by this problem.
>> >
>> >
>> >
>> > 2014-02-06 21:31 GMT+09:00 최종원 <jongwons.c...@gmail.com>:
>> >
>> > > Hello.
>> > >
>> > > My Pig job always make one reduce job in version 0.12.0-h2, ...
>> because
>> > >
>> > > InputSizeReducerEstimator class return input file size always -1.
>> > >
>> > > I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators
>> > > method always return 0 size list.
>> > >
>> > >
>> > >   public int estimateNumberOfReducers(Job job, MapReduceOper
>> > >> mapReduceOper) throws IOException {
>> > >>         Configuration conf = job.getConfiguration();
>> > >>         long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
>> > >> DEFAULT_BYTES_PER_REDUCER);
>> > >>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
>> > >> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>> > >>         List<POLoad> poLoads =
>> > >> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
>> > >>         long totalInputFileSize = getTotalInputFileSize(conf,
>> poLoads,
>> > >> job);
>> > >>         log.info("BytesPerReducer=" + bytesPerReducer + "
>> maxReducers="
>> > >>             + maxReducers + " totalInputFileSize=" +
>> > totalInputFileSize);
>> > >>         // if totalInputFileSize == -1, we couldn't get the input
>> size
>> > >> so we can't estimate.
>> > >>         if (totalInputFileSize == -1) { return -1; }
>> > >>         int reducers = (int)Math.ceil((double)totalInputFileSize /
>> > >> bytesPerReducer);
>> > >>         reducers = Math.max(1, reducers);
>> > >>         reducers = Math.min(maxReducers, reducers);
>> > >>         return reducers;
>> > >>     }
>> > >
>> > >
>> > >
>> > > and the pig job ends successful.
>> > >
>> > > But the reducer planed one one task, it takes very long time.
>> > >
>> > >
>> > > I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.
>> > >
>> > > And also another version by installing ambari 1.4.3.
>> > >
>> > > The result always same.
>> > >
>> > >
>> > > What was wrong ???
>> > >
>> >
>>
>
>

Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

Reply via email to