Hello. My Pig job always make one reduce job in version 0.12.0-h2, ... because
InputSizeReducerEstimator class return input file size always -1. I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators method always return 0 size list. public int estimateNumberOfReducers(Job job, MapReduceOper mapReduceOper) > throws IOException { > Configuration conf = job.getConfiguration(); > long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM, > DEFAULT_BYTES_PER_REDUCER); > int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, > DEFAULT_MAX_REDUCER_COUNT_PARAM); > List<POLoad> poLoads = > PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class); > long totalInputFileSize = getTotalInputFileSize(conf, poLoads, > job); > log.info("BytesPerReducer=" + bytesPerReducer + " maxReducers=" > + maxReducers + " totalInputFileSize=" + totalInputFileSize); > // if totalInputFileSize == -1, we couldn't get the input size so > we can't estimate. > if (totalInputFileSize == -1) { return -1; } > int reducers = (int)Math.ceil((double)totalInputFileSize / > bytesPerReducer); > reducers = Math.max(1, reducers); > reducers = Math.min(maxReducers, reducers); > return reducers; > } and the pig job ends successful. But the reducer planed one one task, it takes very long time. I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version. And also another version by installing ambari 1.4.3. The result always same. What was wrong ???