It works
This confirms that Pig is better than Java MapReduce :-)
Thanks everyone for their help.
Input:
Toy Story|0|0|0|0|1|1|0|0|0
GoldenEye|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
SomeNewMovie|0|0|0|0|1|1|0|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1
S
Russell's code works with a little modification. (The cast to int doesn't
work though.)
movie_and_genres = FOREACH movies GENERATE $0 as movie_name,
(bag{tuple()})TOBAG($2 ..) AS genres: bag{genre_bit: tuple()};
foo = foreach movies_and_genres generate movie_name, (int)SUM(genres) as
genre_total;
It seems like I'm getting closer:
With this data:
Toy Story|0
GoldenEye|0|1|0|1
And this script:
movies = load 'movies' USING PigStorage('|');
movie_and_genres = FOREACH movies GENERATE $0, TOTUPLE($1 ..);
DUMP movie_and_genres;
describe movie_and_genres;
I get this output:
(Toy Story,(0))
(G
On Sun, Mar 10, 2013 at 11:15 PM, Russell Jurney
wrote:
> Try:
>
> movie_and_genres = FOREACH movies GENERATE $0,
> (b:bag{t:tuple(i:int)})TOBAG($2 ..) AS genres:b:bag{t:tuple(i:int)};
> foo = foreach movies_and_genres generate SUM(genres) as genre_total;
Hi Russell
Thanks for your help, but I c
Try:
movie_and_genres = FOREACH movies GENERATE $0,
(b:bag{t:tuple(i:int)})TOBAG($2 ..) AS genres:b:bag{t:tuple(i:int)};
foo = foreach movies_and_genres generate SUM(genres) as genre_total;
Russell Jurney http://datasyndrome.com
On Mar 10, 2013, at 7:45 AM, Nathan Neff wrote:
> movie_and_genr
Hello
I'm trying to find a SUM of a range of fields, and am having difficulty.
I have the following data structure (from the movielens public dataset)
where there's a "fixed" field of "Name" and there's a denormalized "genres" list
(for example, the first column is "action", second is "comedy", e