(not sure if this double posted or not... I accidentally sent it to the Hadoop mailing list and not the pig mailing list)
I appreciate any help you can give. I've searched around and haven't found anything directly related... I've gone through documentation but can't find a real reason why this doesn't work. Here is the jist of my code (n1 is arbitrary, just to group by, n2 is either null or a large integer): table = LOAD stuff AS (n1:chararray, n2:chararray, other irrelevant stuff); pared = foreach table generate n1, n2; grouped = group pared by n1; counted = foreach grouped generate group, (double)SUM((IsEmpty(pared.n2) ? 0:1))/(double)COUNT(pared.n2) as ratio:double; ordered = order counted by ratio desc; limited = limit ordered 200; dump limited; This gets this error: ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast. If I take out the double parenthesis in the counted sum ERROR 1000: Error during parsing. Invalid alias: SUM in {group: chararray,pared: {n1: chararray,n2: chararray}} I THINK the error is that sum wants the column of a bag as an input, not actual integers...so I thought I'd try and make that happen by making the input take the form I want. So in order to try and get around this, I thought this might work (changing only these lines) pared = foreach beacon_fact generate n1, (IsEmpty(n2) ? 0 : 1) as ooz:int; grouped = group pared by n1; counted = foreach grouped generate group, (double)SUM(pared.n1)/(double)COUNT(pared.n2) as ratio:double; But this gives this error: ERROR 1000: Error during parsing. Invalid alias: n2 in {n1: chararray,ooz: int} I have no real clue why this fails... I tried breaking it up into two steps and it doesn't matter. I'd ideally like to do this without making a UDF, as I feel the base functionality should support it. Not sure. Either way, I'd appreciate any help or pointers, as well as any rationale as to why it does or doesn't work within the pig framework. The whole bag system is still somewhat counterintuitive. Thank you for your time