Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

최종원 Thu, 06 Feb 2014 22:49:32 -0800

Thank you for your answer.

But where can I find pig source of pig-0.12.0-h2 version ?
I think, there must be difference between pig-0.12.0 and pig-0.12.0-h2.


but I cannot find the source version 0.12.0-h2.

when I extract the jar file, there are additional package (like
org.apache.pig.backend.hadoop23.PigJobControl ...).







2014-02-07 10:59 GMT+09:00 Cheolsoo Park <piaozhe...@gmail.com>:

> Hi,
>
> Sounds like you're bitten by PIG-3512-
> https://issues.apache.org/jira/browse/PIG-3512
>
> Can you try to apply the patch and rebuild the jar?
>
> Thanks,
> Cheolsoo
>
>
>
> On Thu, Feb 6, 2014 at 7:27 PM, 최종원 <jongwons.c...@gmail.com> wrote:
>
> > This is the log ...
> >
> > 2014-02-06 17:29:19,087 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Reduce phase detected, estimating # of required reducers.
> > 2014-02-06 17:29:19,087 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Using reducer estimator:
> >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
> > 2014-02-06 17:29:19,087 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
> > - BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize
> > 2014-02-06 17:29:19,087 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Could not estimate number of reducers and no requested or default
> > parallelism set. Defaulting to 1 reducer.
> > 2014-02-06 17:29:19,087 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting Parallelism to 1
> > 2014-02-06 17:29:19,104 [Thread-42] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> >
> > InputSizeReducerEstimator cannot calculate map files size, so doesn't
> > estimate reducer size.
> > But I think, I gave the right hadoop file path.
> > I tried many possible pathes like...
> >
> >   relative-path/to/file
> >   /user/myuser/absolute-path/to/file
> >   hdfs://host:8020/user/myuser/absolute-path/to/file
> >   hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port
> >
> > etc...
> >
> > but the pig failed to estimate reducer size.
> >
> > I am almost defeated... by this problem.
> >
> >
> >
> > 2014-02-06 21:31 GMT+09:00 최종원 <jongwons.c...@gmail.com>:
> >
> > > Hello.
> > >
> > > My Pig job always make one reduce job in version 0.12.0-h2, ... because
> > >
> > > InputSizeReducerEstimator class return input file size always -1.
> > >
> > > I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators
> > > method always return 0 size list.
> > >
> > >
> > >   public int estimateNumberOfReducers(Job job, MapReduceOper
> > >> mapReduceOper) throws IOException {
> > >>         Configuration conf = job.getConfiguration();
> > >>         long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
> > >> DEFAULT_BYTES_PER_REDUCER);
> > >>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
> > >> DEFAULT_MAX_REDUCER_COUNT_PARAM);
> > >>         List<POLoad> poLoads =
> > >> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
> > >>         long totalInputFileSize = getTotalInputFileSize(conf, poLoads,
> > >> job);
> > >>         log.info("BytesPerReducer=" + bytesPerReducer + "
> maxReducers="
> > >>             + maxReducers + " totalInputFileSize=" +
> > totalInputFileSize);
> > >>         // if totalInputFileSize == -1, we couldn't get the input size
> > >> so we can't estimate.
> > >>         if (totalInputFileSize == -1) { return -1; }
> > >>         int reducers = (int)Math.ceil((double)totalInputFileSize /
> > >> bytesPerReducer);
> > >>         reducers = Math.max(1, reducers);
> > >>         reducers = Math.min(maxReducers, reducers);
> > >>         return reducers;
> > >>     }
> > >
> > >
> > >
> > > and the pig job ends successful.
> > >
> > > But the reducer planed one one task, it takes very long time.
> > >
> > >
> > > I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.
> > >
> > > And also another version by installing ambari 1.4.3.
> > >
> > > The result always same.
> > >
> > >
> > > What was wrong ???
> > >
> >
>

Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

Reply via email to