The best way to get answers for such easy questions 1. read docs 2. create sample script and run
doc says that a group (bag of tuple having the same 'stars' value) would be passed to your UDF. Can't understand what confuses you. These things are really basics. 2014-07-23 16:30 GMT+04:00 Ashish Dobhal <dobhalashish...@gmail.com>: > Sorry , > I mean group a by stars; > > > On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <serega.shey...@gmail.com> > wrote: > > > a=load.... > > -- > > b=group movies by stars; > > --error here movies is not an alias > > > > c= foreach b genearte myudf(a); > > > > > > > > 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <dobhalashish...@gmail.com>: > > > > > Thanks Shahab and William I am now clear about The count > > functionality.But > > > stil I have a doubt in the functioning of UDF in general. > > > Example: > > > a=load 'movies' using PigStorage() as (name:chararray, > > > movid:int,stars:int,comment:varchar(300)); > > > b=group movies by stars; > > > c= foreach b genearte myudf(a); > > > In this case what would be the input to the udf : the entire group or a > > > single tupple of that group. > > > I think the input would be a single tupple of that group for each > > > itteration but not sure. > > > Thanks. > > > Ashish. > > > > > > > > > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <shahab.yu...@gmail.com> > > > wrote: > > > > > > > That is confusing and that is something that William Dowling > explained > > an > > > > email blow. > > > > > > > > The scope of the alias b has changed. Now when used with 'for each' > on > > c, > > > > the alias/variable b will be used just to count what belongs to the > > > current > > > > c. > > > > > > > > Imagine that b although is a bag of all the records but when passed > to > > > the > > > > count function in 'for each c', only those items/records are filtered > > or > > > > counted which belong to the current c. > > > > > > > > Take a look at this link that I sent earlier (especially the > age_counts > > > > example): > > > > > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/ > > > > > > > > It does not explain everything but it is a more detailed example with > > > > comments and perhaps would help you to understand this Pig specific > > > > concept. > > > > > > > > Regards, > > > > Shahab > > > > > > > > > > > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal < > > > dobhalashish...@gmail.com > > > > > > > > > wrote: > > > > > > > > > In this case does the b refer to the tupples corresponding to a > > single > > > > > group. If so I still did not get the point because b is a bag that > > > > contains > > > > > all the records and not only the records of a single group > > > > > > > > > > On Jul 21, 2014 8:33 PM, <william.dowl...@thomsonreuters.com> > wrote: > > > > > > > > > > > > This was hard for me to get when I started using pig, and it > still > > > > annoys > > > > > me after 1.5 year's experience with pig. In mathematics and logic, > > > > > quantifiers (like "for each", "there exist") bind variables that > > occur > > > in > > > > > their scope: > > > > > > (for each x)(there exists y) [y > x] > > > > > > > > > > > > The (for each x) binds x in (there exists y) [y > x] > > > > > > > > > > > > But in pig the variable x in (for each x) *does not bind > > occurrences > > > of > > > > > x* in the following subexpression. IMO this is an unnecessary > > stumbling > > > > > block to people learning pig, who have a background in math or > logic. > > > > > > > > > > > > Here is how you can read > > > > > > foreach c generate COUNT(b), group; > > > > > > so it makes sense: > > > > > > c's components are "group" and (bag) b, so: > > > > > > foreach (group, b) in c generate COUNT(b), group; > > > > > > > > > > > > I would love it if the Pig syntax were extended to allow > > quantifiers > > > > like > > > > > "foreach (group, b) in c" but I don't know how feasible that would > > be. > > > > > > > > > > > > William F Dowling > > > > > > Senior Technologist > > > > > > Thomson Reuters > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > From: Ashish Dobhal [mailto:dobhalashish...@gmail.com] > > > > > > Sent: Monday, July 21, 2014 10:34 AM > > > > > > To: user@pig.apache.org > > > > > > Subject: Re: Problem in understanding UDF COUNT > > > > > > > > > > > > Shahab Thanks > > > > > > My doubt is why are we taking the bag b and not bag c as the > > > arguement > > > > > in the COUNT(b) function. > > > > > > The bag c contains the groups and not hte bag b. > > > > > > TThanks. > > > > > > > > > > > > > > > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus < > > > shahab.yu...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Have you seen this documentation and blog? > > > > > > > > > > > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig > > > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count > > > > > > > > > > > > > > They explain this in detail. > > > > > > > > > > > > > > Regards, > > > > > > > Shahab > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal > > > > > > > <dobhalashish...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a > > generate > > > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by > word; > > d > > > = > > > > > > > > foreach c generate COUNT(b), group; > > > > > > > > > > > > > > > > I want to know what would be the input to the udf COUNT in > this > > > > > > > > case.Also what is the meaning of b being passed as an > > arguement. > > > > > > > > > > > > > > > > Also I am still not clear acout how count operates. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > Ashish > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >