I hithttps://issues.apache.org/jira/browse/PIG-3512

Le 24/03/2014 14:40, Vincent Barat a écrit :
Hi,

Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation of the number of reducers no longer work.

My script:

A = load 'data';
B = group A by $0;
store B into 'out';

My data:

grunt> ls
hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging <dir> hdfs://computation-master.dev.ubithere.com:9000/user/root/data<r 3> 1908911680

When I run my script (see the last line):

Apache Pig version 0.12.1-SNAPSHOT (rexported) compiled Feb 06 2014, 16:57:49
Logging error messages to: /root/pig.log
Default bootup file /root/.pigbootup not found
Connecting to hadoop file system at: hdfs://computation-master.dev.ubithere.com:9000 Connecting to map-reduce job tracker at: computation-master.dev.ubithere.com:9001
Pig features used in the script: GROUP_BY
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
File concatenation threshold: 100 optimistic? false
MR plan size before optimization: 1
MR plan size after optimization: 1
Pig script settings are added to the job
mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
creating jar file Job7470230163933306330.jar
jar file Job7470230163933306330.jar created
Setting up single store job
Reduce phase detected, estimating # of required reducers.
Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=-1
Could not estimate number of reducers and no requested or default parallelism set. Defaulting to 1 reducer.
Setting Parallelism to 1

I tried to debug; in the source code below, the PlanHelper.getPhysicalOperators always return an empty list.

public int estimateNumberOfReducers(Job job, MapReduceOper mapReduceOper) throws IOException {
        Configuration conf = job.getConfiguration();

long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM, DEFAULT_BYTES_PER_REDUCER); int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, DEFAULT_MAX_REDUCER_COUNT_PARAM);

List<POLoad> poLoads = PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class); long totalInputFileSize = getTotalInputFileSize(conf, poLoads, job);

Any idea ?

Thanks for your help


Reply via email to