This was hard for me to get when I started using pig, and it still annoys me 
after 1.5 year's experience with pig. In mathematics and logic, quantifiers 
(like "for each", "there exist") bind variables that occur in their scope:
(for each x)(there exists y) [y > x]

The (for each x) binds x in (there exists y) [y > x]

But in pig the variable x in (for each x) *does not bind occurrences of x* in 
the following subexpression. IMO this is an unnecessary stumbling block to 
people learning pig, who have a background in math or logic.

Here is how you can read
        foreach c generate COUNT(b), group;
so it makes sense:
        c's components are "group" and (bag) b, so:
        foreach (group, b) in c generate COUNT(b), group;

I would love it if the Pig syntax were extended to allow quantifiers like  
"foreach (group, b) in c" but I don't know how feasible that would be.

William F Dowling
Senior Technologist
Thomson Reuters


-----Original Message-----
From: Ashish Dobhal [mailto:[email protected]] 
Sent: Monday, July 21, 2014 10:34 AM
To: [email protected]
Subject: Re: Problem in understanding UDF COUNT

Shahab Thanks
My doubt is why are we taking the bag b and not  bag c as the arguement in the 
COUNT(b) function.
The bag c contains the groups and not hte bag b.
TThanks.


On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <[email protected]>
wrote:

> Have you seen this documentation and blog?
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> / http://pig.apache.org/docs/r0.9.2/func.html#count
>
> They explain this in detail.
>
> Regards,
> Shahab
>
>
> On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal 
> <[email protected]>
> wrote:
>
> > a = load '/user/hue/word_count_text.txt'; b = foreach a generate 
> > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d = 
> > foreach c generate COUNT(b), group;
> >
> > I want to know what would be the input to the udf COUNT in this 
> > case.Also what is the meaning of b being passed as an arguement.
> >
> > Also I am still not clear acout how count operates.
> >
> > Thanks
> >
> > Ashish
> >
>

Reply via email to