Hello.

My Pig job always make one reduce job in version 0.12.0-h2, ... because

InputSizeReducerEstimator class return input file size always -1.

I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators
method always return 0 size list.


  public int estimateNumberOfReducers(Job job, MapReduceOper mapReduceOper)
> throws IOException {
>         Configuration conf = job.getConfiguration();
>         long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
> DEFAULT_BYTES_PER_REDUCER);
>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>         List<POLoad> poLoads =
> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
>         long totalInputFileSize = getTotalInputFileSize(conf, poLoads,
> job);
>         log.info("BytesPerReducer=" + bytesPerReducer + " maxReducers="
>             + maxReducers + " totalInputFileSize=" + totalInputFileSize);
>         // if totalInputFileSize == -1, we couldn't get the input size so
> we can't estimate.
>         if (totalInputFileSize == -1) { return -1; }
>         int reducers = (int)Math.ceil((double)totalInputFileSize /
> bytesPerReducer);
>         reducers = Math.max(1, reducers);
>         reducers = Math.min(maxReducers, reducers);
>         return reducers;
>     }



and the pig job ends successful.

But the reducer planed one one task, it takes very long time.


I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.

And also another version by installing ambari 1.4.3.

The result always same.


What was wrong ???

Reply via email to