Re: SUM of project-range of fields?

2013-03-19 Thread Nathan Neff
It works This confirms that Pig is better than Java MapReduce :-) Thanks everyone for their help. Input: Toy Story|0|0|0|0|1|1|0|0|0 GoldenEye|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 SomeNewMovie|0|0|0|0|1|1|0|0|0|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1|1 S

Re: SUM of project-range of fields?

2013-03-19 Thread Abhinav Neelam
Russell's code works with a little modification. (The cast to int doesn't work though.) movie_and_genres = FOREACH movies GENERATE $0 as movie_name, (bag{tuple()})TOBAG($2 ..) AS genres: bag{genre_bit: tuple()}; foo = foreach movies_and_genres generate movie_name, (int)SUM(genres) as genre_total;

Re: SUM of project-range of fields?

2013-03-18 Thread Nathan Neff
It seems like I'm getting closer: With this data: Toy Story|0 GoldenEye|0|1|0|1 And this script: movies = load 'movies' USING PigStorage('|'); movie_and_genres = FOREACH movies GENERATE $0, TOTUPLE($1 ..); DUMP movie_and_genres; describe movie_and_genres; I get this output: (Toy Story,(0)) (G

Re: SUM of project-range of fields?

2013-03-18 Thread Nathan Neff
On Sun, Mar 10, 2013 at 11:15 PM, Russell Jurney wrote: > Try: > > movie_and_genres = FOREACH movies GENERATE $0, > (b:bag{t:tuple(i:int)})TOBAG($2 ..) AS genres:b:bag{t:tuple(i:int)}; > foo = foreach movies_and_genres generate SUM(genres) as genre_total; Hi Russell Thanks for your help, but I c

Re: SUM of project-range of fields?

2013-03-10 Thread Russell Jurney
Try: movie_and_genres = FOREACH movies GENERATE $0, (b:bag{t:tuple(i:int)})TOBAG($2 ..) AS genres:b:bag{t:tuple(i:int)}; foo = foreach movies_and_genres generate SUM(genres) as genre_total; Russell Jurney http://datasyndrome.com On Mar 10, 2013, at 7:45 AM, Nathan Neff wrote: > movie_and_genr

SUM of project-range of fields?

2013-03-10 Thread Nathan Neff
Hello I'm trying to find a SUM of a range of fields, and am having difficulty. I have the following data structure (from the movielens public dataset) where there's a "fixed" field of "Name" and there's a denormalized "genres" list (for example, the first column is "action", second is "comedy", e