I hithttps://issues.apache.org/jira/browse/PIG-3512
Le 24/03/2014 14:40, Vincent Barat a écrit :
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data:
grunt> ls
hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging
<dir>
hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r
3> 1908911680
When I run my script (see the last line):
Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06
2014, 16:57:49
Logging error messages to: /root/pig.log
Default bootup file /root/.pigbootup not found
Connecting to hadoop file system at:
hdfs://computation-master.dev.ubithere.com:9000
Connecting to map-reduce job tracker at:
computation-master.dev.ubithere.com:9001
Pig features used in the script: GROUP_BY
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,
DuplicateForEachColumnRewrite, GroupByConstParallelSetter,
ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, NewPartitionFilterOptimizer,
PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter,
SplitFilter, StreamTypeCastInserter],
RULES_DISABLED=[FilterLogicExpressionSimplifier]}
File concatenation threshold: 100 optimistic? false
MR plan size before optimization: 1
MR plan size after optimization: 1
Pig script settings are added to the job
mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
creating jar file Job7470230163933306330.jar
jar file Job7470230163933306330.jar created
Setting up single store job
Reduce phase detected, estimating # of required reducers.
Using reducer estimator:
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
Could not estimate number of reducers and no requested or default
parallelism set. Defaulting to 1 reducer.
Setting Parallelism to 1
I tried to debug; in the source code below, the
PlanHelper.getPhysicalOperators always return an empty list.
public int estimateNumberOfReducers(Job job, MapReduceOper
mapReduceOper) throws IOException {
Configuration conf = job.getConfiguration();
long bytesPerReducer =
conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER);
int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
DEFAULT_MAX_REDUCER_COUNT_PARAM);
List<POLoad> poLoads =
PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
long totalInputFileSize = getTotalInputFileSize(conf,
poLoads, job);
Any idea ?
Thanks for your help