Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

최종원 Sun, 09 Feb 2014 17:52:01 -0800

Oh...oh.
The current developing version is not appliable to my service.

My pig job failed by another problem..
I want to see the reason why the job failed... but I have no much time.
It looks like more serious problem.


Right now, pig-0.12.0-h2 version is more stable in my case.
So, I decide to use pig-0.12.0-h2 version. and wait for public release of
pig-0.12.1-h2 version.





2014-02-07 18:35 GMT+09:00 최종원 <jongwons.c...@gmail.com>:

> Finally, I solved the problem. thank you.
>
> It fixed in 0.12.1 version.
>
> I've downloaded the source code from github.
> and change the pig version, and make build the 0.12.1 vesion with h2
> option.
>
> It calculate the input source files size, and make multi reduce tasks...
>
> Thank you very much... happy weekend ~ bye
>
>
>
>
> 2014-02-07 15:48 GMT+09:00 최종원 <jongwons.c...@gmail.com>:
>
> Thank you for your answer.
>>
>> But where can I find pig source of pig-0.12.0-h2 version ?
>> I think, there must be difference between pig-0.12.0 and pig-0.12.0-h2.
>>
>> but I cannot find the source version 0.12.0-h2.
>>
>> when I extract the jar file, there are additional package (like
>> org.apache.pig.backend.hadoop23.PigJobControl ...).
>>
>>
>>
>>
>>
>>
>>
>> 2014-02-07 10:59 GMT+09:00 Cheolsoo Park <piaozhe...@gmail.com>:
>>
>> Hi,
>>>
>>> Sounds like you're bitten by PIG-3512-
>>> https://issues.apache.org/jira/browse/PIG-3512
>>>
>>> Can you try to apply the patch and rebuild the jar?
>>>
>>> Thanks,
>>> Cheolsoo
>>>
>>>
>>>
>>> On Thu, Feb 6, 2014 at 7:27 PM, 최종원 <jongwons.c...@gmail.com> wrote:
>>>
>>> > This is the log ...
>>> >
>>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> > - Reduce phase detected, estimating # of required reducers.
>>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> > - Using reducer estimator:
>>> >
>>> >
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
>>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
>>> > - BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize
>>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> > - Could not estimate number of reducers and no requested or default
>>> > parallelism set. Defaulting to 1 reducer.
>>> > 2014-02-06 17:29:19,087 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
>>> > - Setting Parallelism to 1
>>> > 2014-02-06 17:29:19,104 [Thread-42] INFO
>>> >
>>> >
>>>  
>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>> > - 1 map-reduce job(s) waiting for submission.
>>> >
>>> > InputSizeReducerEstimator cannot calculate map files size, so doesn't
>>> > estimate reducer size.
>>> > But I think, I gave the right hadoop file path.
>>> > I tried many possible pathes like...
>>> >
>>> >   relative-path/to/file
>>> >   /user/myuser/absolute-path/to/file
>>> >   hdfs://host:8020/user/myuser/absolute-path/to/file
>>> >
>>> hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port
>>> >
>>> > etc...
>>> >
>>> > but the pig failed to estimate reducer size.
>>> >
>>> > I am almost defeated... by this problem.
>>> >
>>> >
>>> >
>>> > 2014-02-06 21:31 GMT+09:00 최종원 <jongwons.c...@gmail.com>:
>>> >
>>> > > Hello.
>>> > >
>>> > > My Pig job always make one reduce job in version 0.12.0-h2, ...
>>> because
>>> > >
>>> > > InputSizeReducerEstimator class return input file size always -1.
>>> > >
>>> > > I'm not sure the reason, but actually,
>>> PlanHelper.getPhysicalOperators
>>> > > method always return 0 size list.
>>> > >
>>> > >
>>> > >   public int estimateNumberOfReducers(Job job, MapReduceOper
>>> > >> mapReduceOper) throws IOException {
>>> > >>         Configuration conf = job.getConfiguration();
>>> > >>         long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
>>> > >> DEFAULT_BYTES_PER_REDUCER);
>>> > >>         int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
>>> > >> DEFAULT_MAX_REDUCER_COUNT_PARAM);
>>> > >>         List<POLoad> poLoads =
>>> > >> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan,
>>> POLoad.class);
>>> > >>         long totalInputFileSize = getTotalInputFileSize(conf,
>>> poLoads,
>>> > >> job);
>>> > >>         log.info("BytesPerReducer=" + bytesPerReducer + "
>>> maxReducers="
>>> > >>             + maxReducers + " totalInputFileSize=" +
>>> > totalInputFileSize);
>>> > >>         // if totalInputFileSize == -1, we couldn't get the input
>>> size
>>> > >> so we can't estimate.
>>> > >>         if (totalInputFileSize == -1) { return -1; }
>>> > >>         int reducers = (int)Math.ceil((double)totalInputFileSize /
>>> > >> bytesPerReducer);
>>> > >>         reducers = Math.max(1, reducers);
>>> > >>         reducers = Math.min(maxReducers, reducers);
>>> > >>         return reducers;
>>> > >>     }
>>> > >
>>> > >
>>> > >
>>> > > and the pig job ends successful.
>>> > >
>>> > > But the reducer planed one one task, it takes very long time.
>>> > >
>>> > >
>>> > > I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.
>>> > >
>>> > > And also another version by installing ambari 1.4.3.
>>> > >
>>> > > The result always same.
>>> > >
>>> > >
>>> > > What was wrong ???
>>> > >
>>> >
>>>
>>
>>
>

Re: InputSizeReducerEstimator cannot get PhysicalOperators, so My pig job always make one reducer in Hadoop 2.2.0

Reply via email to