This was hard for me to get when I started using pig, and it still annoys me
after 1.5 year's experience with pig. In mathematics and logic, quantifiers
(like "for each", "there exist") bind variables that occur in their scope:
(for each x)(there exists y) [y > x]
The (for each x) binds x in (there exists y) [y > x]
But in pig the variable x in (for each x) *does not bind occurrences of x* in
the following subexpression. IMO this is an unnecessary stumbling block to
people learning pig, who have a background in math or logic.
Here is how you can read
foreach c generate COUNT(b), group;
so it makes sense:
c's components are "group" and (bag) b, so:
foreach (group, b) in c generate COUNT(b), group;
I would love it if the Pig syntax were extended to allow quantifiers like
"foreach (group, b) in c" but I don't know how feasible that would be.
William F Dowling
Senior Technologist
Thomson Reuters
-----Original Message-----
From: Ashish Dobhal [mailto:[email protected]]
Sent: Monday, July 21, 2014 10:34 AM
To: [email protected]
Subject: Re: Problem in understanding UDF COUNT
Shahab Thanks
My doubt is why are we taking the bag b and not bag c as the arguement in the
COUNT(b) function.
The bag c contains the groups and not hte bag b.
TThanks.
On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <[email protected]>
wrote:
> Have you seen this documentation and blog?
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> / http://pig.apache.org/docs/r0.9.2/func.html#count
>
> They explain this in detail.
>
> Regards,
> Shahab
>
>
> On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> <[email protected]>
> wrote:
>
> > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > foreach c generate COUNT(b), group;
> >
> > I want to know what would be the input to the udf COUNT in this
> > case.Also what is the meaning of b being passed as an arguement.
> >
> > Also I am still not clear acout how count operates.
> >
> > Thanks
> >
> > Ashish
> >
>