Looking at your code I found the following mistakes: SUM((IsEmpty(pared.n2) ? 0:1)) will try to do SUM(0) or SUM(1) while SUM expects a tuple.
COUNT(pared.n2) can return 0, and you are making a division by 0, maybe it would be better to filter non-null or to test NULL values. It would avoid an internal exception giving you a NULL result. In the second code give a try to this, I hope it would do the trick: pared = foreach beacon_fact generate n1, n2, (IsEmpty(n2) ? 0 : 1) as ooz:int; grouped = group pared by n1; counted = foreach grouped generate group, (IsEmpty(pared.n2) ? 0:(double)SUM(pared.n1)/(double)COUNT(pared.n2)) as ratio:double; Regards -Vincent On Tue, Nov 30, 2010 at 7:17 PM, Jonathan Coveney <jcove...@gmail.com>wrote: > (not sure if this double posted or not... I accidentally sent it to the > Hadoop mailing list and not the pig mailing list) > > I appreciate any help you can give. I've searched around and haven't found > anything directly related... I've gone through documentation but can't find > a real reason why this doesn't work. > > Here is the jist of my code (n1 is arbitrary, just to group by, n2 is > either > null or a large integer): > > table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff); > pared = foreach table generate n1, n2; > grouped = group pared by n1; > counted = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ? > 0:1))/(double)COUNT(pared.n2) as ratio:double; > ordered = order counted by ratio desc; > limited = limit ordered 200; > dump limited; > > This gets this error: > > ERROR 1045: Could not infer the matching function for > org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an > explicit cast. > > If I take out the double parenthesis in the counted sum > > ERROR 1000: Error during parsing. Invalid alias: SUM in {group: > chararray,pared: {n1: chararray,n2: chararray}} > > I THINK the error is that sum wants the column of a bag as an input, not > actual integers...so I thought I'd try and make that happen by making the > input take the form I want. > > So in order to try and get around this, I thought this might work (changing > only these lines) > > pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int; > grouped = group pared by n1; > counted = foreach grouped generate group, > (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double; > > But this gives this error: > ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz: > int} > > I have no real clue why this fails... I tried breaking it up into two steps > and it doesn't matter. > > I'd ideally like to do this without making a UDF, as I feel the base > functionality should support it. Not sure. > > Either way, I'd appreciate any help or pointers, as well as any rationale > as > to why it does or doesn't work within the pig framework. The whole bag > system is still somewhat counterintuitive. > > Thank you for your time >