Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

최종원 Thu, 06 Feb 2014 16:28:13 -0800

This is the log ...

2014-02-06 17:29:19,087 [Thread-42] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Reduce phase detected, estimating # of required reducers.
2014-02-06 17:29:19,087 [Thread-42] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Using reducer estimator:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2014-02-06 17:29:19,087 [Thread-42] INFO
 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
- BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize
2014-02-06 17:29:19,087 [Thread-42] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Could not estimate number of reducers and no requested or default
parallelism set. Defaulting to 1 reducer.
2014-02-06 17:29:19,087 [Thread-42] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting Parallelism to 1
2014-02-06 17:29:19,104 [Thread-42] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.


InputSizeReducerEstimator cannot calculate map files size, so doesn't
estimate reducer size.
But I think, I gave the right hadoop file path.
I tried many possible pathes like...

  relative-path/to/file
  /user/myuser/absolute-path/to/file
  hdfs://host:8020/user/myuser/absolute-path/to/file
  hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port

etc...

but the pig failed to estimate reducer size.

I am almost defeated... by this problem.



2014-02-06 21:31 GMT+09:00 최종원 <jongwons.c...@gmail.com>:

> Hello.
>
> My Pig job always make one reduce job in version 0.12.0-h2, ... because
>
> InputSizeReducerEstimator class return input file size always -1.
>
> I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators
> method always return 0 size list.
>
>
>   public int estimateNumberOfReducers(Job job, MapReduceOper
>> mapReduceOper) throws IOException {
>>         Configuration conf = job.getConfiguration();
>>         long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
>> DEFAULT_BYTES_PER_REDUCER);
>>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
>> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>>         List<POLoad> poLoads =
>> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
>>         long totalInputFileSize = getTotalInputFileSize(conf, poLoads,
>> job);
>>         log.info("BytesPerReducer=" + bytesPerReducer + " maxReducers="
>>             + maxReducers + " totalInputFileSize=" + totalInputFileSize);
>>         // if totalInputFileSize == -1, we couldn't get the input size
>> so we can't estimate.
>>         if (totalInputFileSize == -1) { return -1; }
>>         int reducers = (int)Math.ceil((double)totalInputFileSize /
>> bytesPerReducer);
>>         reducers = Math.max(1, reducers);
>>         reducers = Math.min(maxReducers, reducers);
>>         return reducers;
>>     }
>
>
>
> and the pig job ends successful.
>
> But the reducer planed one one task, it takes very long time.
>
>
> I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.
>
> And also another version by installing ambari 1.4.3.
>
> The result always same.
>
>
> What was wrong ???
>

Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

Reply via email to