Thank you for your answer. But where can I find pig source of pig-0.12.0-h2 version ? I think, there must be difference between pig-0.12.0 and pig-0.12.0-h2.
but I cannot find the source version 0.12.0-h2. when I extract the jar file, there are additional package (like org.apache.pig.backend.hadoop23.PigJobControl ...). 2014-02-07 10:59 GMT+09:00 Cheolsoo Park <piaozhe...@gmail.com>: > Hi, > > Sounds like you're bitten by PIG-3512- > https://issues.apache.org/jira/browse/PIG-3512 > > Can you try to apply the patch and rebuild the jar? > > Thanks, > Cheolsoo > > > > On Thu, Feb 6, 2014 at 7:27 PM, 최종원 <jongwons.c...@gmail.com> wrote: > > > This is the log ... > > > > 2014-02-06 17:29:19,087 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Reduce phase detected, estimating # of required reducers. > > 2014-02-06 17:29:19,087 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Using reducer estimator: > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator > > 2014-02-06 17:29:19,087 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator > > - BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize > > 2014-02-06 17:29:19,087 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Could not estimate number of reducers and no requested or default > > parallelism set. Defaulting to 1 reducer. > > 2014-02-06 17:29:19,087 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting Parallelism to 1 > > 2014-02-06 17:29:19,104 [Thread-42] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map-reduce job(s) waiting for submission. > > > > InputSizeReducerEstimator cannot calculate map files size, so doesn't > > estimate reducer size. > > But I think, I gave the right hadoop file path. > > I tried many possible pathes like... > > > > relative-path/to/file > > /user/myuser/absolute-path/to/file > > hdfs://host:8020/user/myuser/absolute-path/to/file > > hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port > > > > etc... > > > > but the pig failed to estimate reducer size. > > > > I am almost defeated... by this problem. > > > > > > > > 2014-02-06 21:31 GMT+09:00 최종원 <jongwons.c...@gmail.com>: > > > > > Hello. > > > > > > My Pig job always make one reduce job in version 0.12.0-h2, ... because > > > > > > InputSizeReducerEstimator class return input file size always -1. > > > > > > I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators > > > method always return 0 size list. > > > > > > > > > public int estimateNumberOfReducers(Job job, MapReduceOper > > >> mapReduceOper) throws IOException { > > >> Configuration conf = job.getConfiguration(); > > >> long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM, > > >> DEFAULT_BYTES_PER_REDUCER); > > >> int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, > > >> DEFAULT_MAX_REDUCER_COUNT_PARAM); > > >> List<POLoad> poLoads = > > >> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class); > > >> long totalInputFileSize = getTotalInputFileSize(conf, poLoads, > > >> job); > > >> log.info("BytesPerReducer=" + bytesPerReducer + " > maxReducers=" > > >> + maxReducers + " totalInputFileSize=" + > > totalInputFileSize); > > >> // if totalInputFileSize == -1, we couldn't get the input size > > >> so we can't estimate. > > >> if (totalInputFileSize == -1) { return -1; } > > >> int reducers = (int)Math.ceil((double)totalInputFileSize / > > >> bytesPerReducer); > > >> reducers = Math.max(1, reducers); > > >> reducers = Math.min(maxReducers, reducers); > > >> return reducers; > > >> } > > > > > > > > > > > > and the pig job ends successful. > > > > > > But the reducer planed one one task, it takes very long time. > > > > > > > > > I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version. > > > > > > And also another version by installing ambari 1.4.3. > > > > > > The result always same. > > > > > > > > > What was wrong ??? > > > > > >