Yes but I'm still not able to compute the percentage. I've joined the bags
as below.

A = LOAD '/data/marco/foo.csv' USING PigStorage(',') AS (name:cha
rarray, region:chararray, gender:chararray, iq:int);
iq_per_region_per_gender = GROUP A BY (region, gender);
total_iq_per_gender = GROUP A BY (gender);

describe iq_per_region_per_gender
iq_per_region_per_gender: {group: (region: chararray,gender: chararray),A:
{(name: chararray,region: chararray,gender: chararray,iq: int)}}

describe total_iq_per_gender;
total_iq_per_gender: {group: chararray,A: {(name: chararray,region:
chararray,gender: chararray,iq: int)}}

total = JOIN iq_per_region_per_gender BY group.gender, total_iq_per_gender
BY $0;

describe total
total: {iq_per_region_per_gender::group: (region: chararray,gender:
chararray),iq_per_region_per_gender::A: {(name: chararray,region:
chararray,gender: chararray,iq: int)},total_iq_per_gender::group:
chararray,total_iq_per_gender::A: {(name: chararray,region:
chararray,gender: chararray,iq: int)}}

-- Now I would like to use the 'joined' data.
-- providing me sth like this:
-- Male, Here, 0.2
-- Female, Here, 0,8
-- Male, There, 1
-- Female, There, 0
-- But I'm not sure how my FOREACH GENERATE needs to look like.


On Wed, Oct 12, 2011 at 10:34 AM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:

> Sure, just join your total counts with your partials on gender.
>
> D
>
> On Tue, Oct 11, 2011 at 11:58 PM, Marco Cadetg <ma...@zattoo.com> wrote:
>
> > D'oh I just see that unfortunately my example was a bit over simplified.
> > The
> > total needs to be grouped by another field like below.
> >
> > A = LOAD 'student' USING PigStorage() AS (name:chararray,
> region:chararray,
> > gender:charrarray, iq:int);
> > DUMP A;
> > (Eva, There, Female,500)
> > (John, There, Male, 10)
> > (Alf, There, Male, 10)
> > (ET, There, Male, 10)
> > (Mary, Here, Female, 80)
> > (Bill, Here, Male, 100)
> > (Joe, Here, Male, 150)
> >
> > total_iq_per_region = GROUP A BY (region, gender);
> >
> > total_iq_per_region_per_gender = FOREACH total_iq_per_region
> > {
> >  GENERATE FLATTEN(group),
> >  SUM(A.iq) AS iq_per_region_per_gender;
> > }
> >
> > total_iq_per_gender = GROUP A BY (gender);
> >
> > total_iq_per_gender = FOREACH A
> > {
> >  GENERATE FLATTEN(group),
> >  SUM(A.iq) AS iq_per_gender;
> > }
> >
> > Now I guess I could use JOIN to combine both bags(?) by gender but
> somehow
> > I
> > don't get it.
> >
> > Thanks
> > -Marco
> >
> > On Tue, Oct 11, 2011 at 6:02 PM, Marco Cadetg <ma...@zattoo.com> wrote:
> >
> > > Thanks a lot, Shawn! Looks like I need to learn some basics ;)
> > > -Marco
> > >
> > > On Tue, Oct 11, 2011 at 5:39 PM, Xiaomeng Wan <shawn...@gmail.com>
> > wrote:
> > >
> > >> total_iq = foreach (group A by all) generate SUM(A.iq) as total;
> > >>
> > >> total_iq_per_region = FOREACH total_iq_per_region
> > >> {
> > >>  GENERATE FLATTEN(group),
> > >>  SUM(A.iq)/total_iq.total AS iq_per_region;
> > >> }
> > >>
> > >> Shawn
> > >>
> > >>
> > >> On Tue, Oct 11, 2011 at 9:20 AM, Marco Cadetg <ma...@zattoo.com>
> wrote:
> > >> > Hi there,
> > >> >
> > >> > I would need to do something like this:
> > >> >
> > >> > A = LOAD 'student' USING PigStorage() AS (name:chararray,
> > >> region:chararry,
> > >> > iq:int);
> > >> > DUMP A;
> > >> > (John, There, 10)
> > >> > (Alf, There, 10)
> > >> > (ET, There, 10)
> > >> > (Mary, Here, 80)
> > >> > (Bill, Here, 100)
> > >> > (Joe, Here, 150)
> > >> >
> > >> > total_iq_per_region = GROUP A BY (region);
> > >> >
> > >> > total_iq_per_region = FOREACH total_iq_per_region
> > >> > {
> > >> >  GENERATE FLATTEN(group),
> > >> >  SUM(A.iq) AS iq_per_region;
> > >> > }
> > >> >
> > >> > total_iq = FOREACH A
> > >> > {
> > >> >  GENERATE SUM(iq) AS total_iq:
> > >> > }
> > >> >
> > >> > Now I would like to retrieve the percentage of the region e.g.
> > >> iq_per_reqion
> > >> > / total_iq and store the result. How can I achieve that? I hope my
> > >> example
> > >> > is not too confusing.
> > >> >
> > >> > Cheers
> > >> > -Marco
> > >> >
> > >>
> > >
> > >
> >
>

Reply via email to