Thanks Bill. This is exactly what I was looking for. We are using version 11 but not the latest from the trunk.
I would have to rebuild using latest. Alex On Fri, Jun 1, 2012 at 12:55 PM, Bill Graham <[email protected]> wrote: > What version of Pig are you running, and if it's not the trunk can you try > with the trunk? > > There have been a number of improvements to how we get total input size > when estimating reducers. Basically, the input size is now requested from > the LoadFunc, which has more info about statistics. > > See > https://issues.apache.org/jira/browse/PIG-2573 > https://issues.apache.org/jira/browse/PIG-2693 > > On Fri, Jun 1, 2012 at 8:49 AM, Alex Rovner <[email protected]> wrote: > > > Hello, > > > > We have wrote a HiveLoader that loads data from a hive warehouse > > (HCatalogue had roadblocks at the time and we decided against using it) > > > > We have one minor issue that would be great to solve: Currently pig > cannot > > estimate correctly how many reducers to use when loading data from a hive > > warehouse. > > > > We have looked through the code and traced the problem to the following: > > > > Pig is using the location returned from "relativeToAbsolutePath" to > figure > > out how many reducers it needs. In the case of loading from Hive, we do > not > > know the paths that we need to load up until the setPartition() call is > > made. We can of course set the root of the table as the path in the > > "relativeToAbsolutePath" call but that would make pig over-estimate the > > number of reducers needed since we wont take into account the partition > > filtering that is taking place. > > > > Are there any workarounds for this issue? > > From my understanding, it would be sufficient if the > relativeToAbsolutePath > > call was called after the setLocation and setPartition calls. > > > > Any input would be appreciated. > > > > Thanks > > Alex > > > > > > -- > *Note that I'm no longer using my Yahoo! email address. Please email me at > [email protected] going forward.* >
