Yes but I'm still not able to compute the percentage. I've joined the bags as below.
A = LOAD '/data/marco/foo.csv' USING PigStorage(',') AS (name:cha rarray, region:chararray, gender:chararray, iq:int); iq_per_region_per_gender = GROUP A BY (region, gender); total_iq_per_gender = GROUP A BY (gender); describe iq_per_region_per_gender iq_per_region_per_gender: {group: (region: chararray,gender: chararray),A: {(name: chararray,region: chararray,gender: chararray,iq: int)}} describe total_iq_per_gender; total_iq_per_gender: {group: chararray,A: {(name: chararray,region: chararray,gender: chararray,iq: int)}} total = JOIN iq_per_region_per_gender BY group.gender, total_iq_per_gender BY $0; describe total total: {iq_per_region_per_gender::group: (region: chararray,gender: chararray),iq_per_region_per_gender::A: {(name: chararray,region: chararray,gender: chararray,iq: int)},total_iq_per_gender::group: chararray,total_iq_per_gender::A: {(name: chararray,region: chararray,gender: chararray,iq: int)}} -- Now I would like to use the 'joined' data. -- providing me sth like this: -- Male, Here, 0.2 -- Female, Here, 0,8 -- Male, There, 1 -- Female, There, 0 -- But I'm not sure how my FOREACH GENERATE needs to look like. On Wed, Oct 12, 2011 at 10:34 AM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Sure, just join your total counts with your partials on gender. > > D > > On Tue, Oct 11, 2011 at 11:58 PM, Marco Cadetg <ma...@zattoo.com> wrote: > > > D'oh I just see that unfortunately my example was a bit over simplified. > > The > > total needs to be grouped by another field like below. > > > > A = LOAD 'student' USING PigStorage() AS (name:chararray, > region:chararray, > > gender:charrarray, iq:int); > > DUMP A; > > (Eva, There, Female,500) > > (John, There, Male, 10) > > (Alf, There, Male, 10) > > (ET, There, Male, 10) > > (Mary, Here, Female, 80) > > (Bill, Here, Male, 100) > > (Joe, Here, Male, 150) > > > > total_iq_per_region = GROUP A BY (region, gender); > > > > total_iq_per_region_per_gender = FOREACH total_iq_per_region > > { > > GENERATE FLATTEN(group), > > SUM(A.iq) AS iq_per_region_per_gender; > > } > > > > total_iq_per_gender = GROUP A BY (gender); > > > > total_iq_per_gender = FOREACH A > > { > > GENERATE FLATTEN(group), > > SUM(A.iq) AS iq_per_gender; > > } > > > > Now I guess I could use JOIN to combine both bags(?) by gender but > somehow > > I > > don't get it. > > > > Thanks > > -Marco > > > > On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <ma...@zattoo.com> wrote: > > > > > Thanks a lot, Shawn! Looks like I need to learn some basics ;) > > > -Marco > > > > > > On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <shawn...@gmail.com> > > wrote: > > > > > >> total_iq = foreach (group A by all) generate SUM(A.iq) as total; > > >> > > >> total_iq_per_region = FOREACH total_iq_per_region > > >> { > > >> GENERATE FLATTEN(group), > > >> SUM(A.iq)/total_iq.total AS iq_per_region; > > >> } > > >> > > >> Shawn > > >> > > >> > > >> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <ma...@zattoo.com> > wrote: > > >> > Hi there, > > >> > > > >> > I would need to do something like this: > > >> > > > >> > A = LOAD 'student' USING PigStorage() AS (name:chararray, > > >> region:chararry, > > >> > iq:int); > > >> > DUMP A; > > >> > (John, There, 10) > > >> > (Alf, There, 10) > > >> > (ET, There, 10) > > >> > (Mary, Here, 80) > > >> > (Bill, Here, 100) > > >> > (Joe, Here, 150) > > >> > > > >> > total_iq_per_region = GROUP A BY (region); > > >> > > > >> > total_iq_per_region = FOREACH total_iq_per_region > > >> > { > > >> > GENERATE FLATTEN(group), > > >> > SUM(A.iq) AS iq_per_region; > > >> > } > > >> > > > >> > total_iq = FOREACH A > > >> > { > > >> > GENERATE SUM(iq) AS total_iq: > > >> > } > > >> > > > >> > Now I would like to retrieve the percentage of the region e.g. > > >> iq_per_reqion > > >> > / total_iq and store the result. How can I achieve that? I hope my > > >> example > > >> > is not too confusing. > > >> > > > >> > Cheers > > >> > -Marco > > >> > > > >> > > > > > > > > >