Thanks Shahab and William I am now clear about The count functionality.But
stil I have a doubt in the functioning of UDF in general.
Example:
a=load 'movies' using PigStorage() as (name:chararray,
movid:int,stars:int,comment:varchar(300));
b=group movies by stars;
c= foreach b genearte myudf(a);
In this case what would be the input to the udf : the entire group or a
single tupple of that group.
I think the input would be a single tupple of that group for each
itteration but not sure.
Thanks.
Ashish.


On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <[email protected]>
wrote:

> That is confusing and that is something that William Dowling explained an
> email blow.
>
> The scope of the alias b has changed. Now when used with 'for each' on c,
> the alias/variable b will be used just to count what belongs to the current
> c.
>
> Imagine that b although is a bag of all the records but when passed to the
> count function in 'for each c', only those items/records are filtered or
> counted which belong to the current c.
>
> Take a look at this link that I sent earlier (especially the age_counts
> example):
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
>
> It does not explain everything but it is a more detailed example with
> comments and perhaps would help you to understand this Pig specific
> concept.
>
> Regards,
> Shahab
>
>
> On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <[email protected]
> >
> wrote:
>
> > In this case does the b refer to the tupples corresponding to a single
> > group. If so I still did not get the point because b is a bag that
> contains
> > all the records and not only the records of a single group
> >
> > On Jul 21, 2014 8:33 PM, <[email protected]> wrote:
> > >
> > > This was hard for me to get when I started using pig, and it still
> annoys
> > me after 1.5 year's experience with pig. In mathematics and logic,
> > quantifiers (like "for each", "there exist") bind variables that occur in
> > their scope:
> > > (for each x)(there exists y) [y > x]
> > >
> > > The (for each x) binds x in (there exists y) [y > x]
> > >
> > > But in pig the variable x in (for each x) *does not bind occurrences of
> > x* in the following subexpression. IMO this is an unnecessary stumbling
> > block to people learning pig, who have a background in math or logic.
> > >
> > > Here is how you can read
> > >         foreach c generate COUNT(b), group;
> > > so it makes sense:
> > >         c's components are "group" and (bag) b, so:
> > >         foreach (group, b) in c generate COUNT(b), group;
> > >
> > > I would love it if the Pig syntax were extended to allow quantifiers
> like
> >  "foreach (group, b) in c" but I don't know how feasible that would be.
> > >
> > > William F Dowling
> > > Senior Technologist
> > > Thomson Reuters
> > >
> > >
> > > -----Original Message-----
> > > From: Ashish Dobhal [mailto:[email protected]]
> > > Sent: Monday, July 21, 2014 10:34 AM
> > > To: [email protected]
> > > Subject: Re: Problem in understanding UDF COUNT
> > >
> > > Shahab Thanks
> > > My doubt is why are we taking the bag b and not  bag c as the arguement
> > in the COUNT(b) function.
> > > The bag c contains the groups and not hte bag b.
> > > TThanks.
> > >
> > >
> > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <[email protected]>
> > > wrote:
> > >
> > > > Have you seen this documentation and blog?
> > > >
> http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > >
> > > > They explain this in detail.
> > > >
> > > > Regards,
> > > > Shahab
> > > >
> > > >
> > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > <[email protected]>
> > > > wrote:
> > > >
> > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a generate
> > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by word; d =
> > > > > foreach c generate COUNT(b), group;
> > > > >
> > > > > I want to know what would be the input to the udf COUNT in this
> > > > > case.Also what is the meaning of b being passed as an arguement.
> > > > >
> > > > > Also I am still not clear acout how count operates.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Ashish
> > > > >
> > > >
> >
>

Reply via email to