Re: number of reducers

Alex Rovner Fri, 01 Jun 2012 12:29:55 -0700

Thanks Bill. This is exactly what I was looking for.

We are using version 11 but not the latest from the trunk.


I would have to rebuild using latest.

Alex

On Fri, Jun 1, 2012 at 12:55 PM, Bill Graham <[email protected]> wrote:

> What version of Pig are you running, and if it's not the trunk can you try
> with the trunk?
>
> There have been a number of improvements to how we get total input size
> when estimating reducers. Basically, the input size is now requested from
> the LoadFunc, which has more info about statistics.
>
> See
> https://issues.apache.org/jira/browse/PIG-2573
> https://issues.apache.org/jira/browse/PIG-2693
>
> On Fri, Jun 1, 2012 at 8:49 AM, Alex Rovner <[email protected]> wrote:
>
> > Hello,
> >
> > We have wrote a HiveLoader that loads data from a hive warehouse
> > (HCatalogue had roadblocks at the time and we decided against using it)
> >
> > We have one minor issue that would be great to solve: Currently pig
> cannot
> > estimate correctly how many reducers to use when loading data from a hive
> > warehouse.
> >
> > We have looked through the code and traced the problem to the following:
> >
> > Pig is using the location returned from "relativeToAbsolutePath" to
> figure
> > out how many reducers it needs. In the case of loading from Hive, we do
> not
> > know the paths that we need to load up until the setPartition() call is
> > made. We can of course set the root of the table as the path in the
> > "relativeToAbsolutePath" call but that would make pig over-estimate the
> > number of reducers needed since we wont take into account the partition
> > filtering that is taking place.
> >
> > Are there any workarounds for this issue?
> > From my understanding, it would be sufficient if the
> relativeToAbsolutePath
> > call was called after the setLocation and setPartition calls.
> >
> > Any input would be appreciated.
> >
> > Thanks
> > Alex
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> [email protected] going forward.*
>

Re: number of reducers

Reply via email to