the issue is that doing (int)b.x does not cast each column to an int, but
rather, it tries to cast the bag itself. Short of flattening out the bag
and projecting it as an int, which is inefficient, I suppose you could make
a UDF that calculate the Average of chararrays by casting to an int...but
then that raises the question of why you couldn't just load it as an x:int
in the first place.

So generally, you need to do something like "foreach rel generate (int)x".
In this case that doesn't work as efficiently, but this is kind of a weird
case.

2012/2/14 Haitao Yao <yao.e...@gmail.com>

> hi, all
>        here's my pig script:
>
> A = load 'input' as (b:bag{t:(x:int, y:int)});
> B = foreach A generate AVG(b.x);
> describe B;
>
>  it works well.
>  if the b.x is char array, the problems arise:
> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
> B = foreach A generate AVG((int)b.x);
> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1052:
> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
> to int
> Details at logfile: /tmp/pig_1329286634873.log
>
> Why?  How can I calculate the avg of b.x if b.x must be a chararray?
>
>
> here's the running snapshot in Grunt:
>
> grunt> A = load 'input' as (b:bag{t:(x:int, y:int)});
> grunt> B = foreach A generate AVG(b.x);
> grunt> describe B;
> B: {double}
> grunt> A = load 'input' as (b:bag{t:(x:chararray, y:int)});
> grunt> B = foreach A generate AVG((int)b.x);
> 2012-02-15 14:17:17,937 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 1052:
> <line 4, column 28> Cannot cast bag with schema :bag{:tuple(x:chararray)}
> to int
> Details at logfile: /tmp/pig_1329286634873.log
> grunt>
>
> thanks.
>
>

Reply via email to