In this case does the b refer to the tupples corresponding to a single
group. If so I still did not get the point because b is a bag that contains
all the records and not only the records of a single group

On Jul 21, 2014 8:33 PM, <william.dowl...@thomsonreuters.com> wrote:
>
> This was hard for me to get when I started using pig, and it still annoys
me after 1.5 year's experience with pig. In mathematics and logic,
quantifiers (like "for each", "there exist") bind variables that occur in
their scope:
> (for each x)(there exists y) [y > x]
>
> The (for each x) binds x in (there exists y) [y > x]
>
> But in pig the variable x in (for each x) *does not bind occurrences of
x* in the following subexpression. IMO this is an unnecessary stumbling
block to people learning pig, who have a background in math or logic.
>
> Here is how you can read
>         foreach c generate COUNT(b), group;
> so it makes sense:
>         c's components are "group" and (bag) b, so:
>         foreach (group, b) in c generate COUNT(b), group;
>
> I would love it if the Pig syntax were extended to allow quantifiers like
 "foreach (group, b) in c" but I don't know how feasible that would be.
>
> William F Dowling
> Senior Technologist
> Thomson Reuters
>
>
> -----Original Message-----
> From: Ashish Dobhal [mailto:dobhalashish...@gmail.com]
> Sent: Monday, July 21, 2014 10:34 AM
> To: user@pig.apache.org
> Subject: Re: Problem in understanding UDF COUNT
>
> Shahab Thanks
> My doubt is why are we taking the bag b and not  bag c as the arguement
in the COUNT(b) function.
> The bag c contains the groups and not hte bag b.
> TThanks.
>
>
> On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <shahab.yu...@gmail.com>
> wrote:
>
> > Have you seen this documentation and blog?
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > / http://pig.apache.org/docs/r0.9.2/func.html#count
> >
> > They explain this in detail.
> >
> > Regards,
> > Shahab
> >
> >
> > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > <dobhalashish...@gmail.com>
> > wrote:
> >
> > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > foreach c generate COUNT(b), group;
> > >
> > > I want to know what would be the input to the udf COUNT in this
> > > case.Also what is the meaning of b being passed as an arguement.
> > >
> > > Also I am still not clear acout how count operates.
> > >
> > > Thanks
> > >
> > > Ashish
> > >
> >

Reply via email to