You are welcome, hoe this helps you with pig. It's really easy

2014-07-23 17:01 GMT+04:00 Ashish Dobhal <dobhalashish...@gmail.com>:

> Thanks Serega Sheypak.
>
>
> On Wed, Jul 23, 2014 at 6:16 PM, Serega Sheypak <serega.shey...@gmail.com>
> wrote:
>
> > The best way to get answers for such easy questions
> > 1. read docs
> > 2. create sample script and run
> >
> > doc says that a group (bag of tuple having the same 'stars' value) would
> be
> > passed to your UDF.
> > Can't understand what confuses you. These things are really basics.
> >
> >
> > 2014-07-23 16:30 GMT+04:00 Ashish Dobhal <dobhalashish...@gmail.com>:
> >
> > > Sorry ,
> > > I mean group a by stars;
> > >
> > >
> > > On Wed, Jul 23, 2014 at 5:58 PM, Serega Sheypak <
> > serega.shey...@gmail.com>
> > > wrote:
> > >
> > > > a=load....
> > > > --
> > > > b=group movies by stars;
> > > > --error here movies is not an alias
> > > >
> > > > c= foreach b genearte myudf(a);
> > > >
> > > >
> > > >
> > > > 2014-07-23 16:03 GMT+04:00 Ashish Dobhal <dobhalashish...@gmail.com
> >:
> > > >
> > > > > Thanks Shahab and William I am now clear about The count
> > > > functionality.But
> > > > > stil I have a doubt in the functioning of UDF in general.
> > > > > Example:
> > > > > a=load 'movies' using PigStorage() as (name:chararray,
> > > > > movid:int,stars:int,comment:varchar(300));
> > > > > b=group movies by stars;
> > > > > c= foreach b genearte myudf(a);
> > > > > In this case what would be the input to the udf : the entire group
> > or a
> > > > > single tupple of that group.
> > > > > I think the input would be a single tupple of that group for each
> > > > > itteration but not sure.
> > > > > Thanks.
> > > > > Ashish.
> > > > >
> > > > >
> > > > > On Tue, Jul 22, 2014 at 5:30 PM, Shahab Yunus <
> > shahab.yu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > That is confusing and that is something that William Dowling
> > > explained
> > > > an
> > > > > > email blow.
> > > > > >
> > > > > > The scope of the alias b has changed. Now when used with 'for
> each'
> > > on
> > > > c,
> > > > > > the alias/variable b will be used just to count what belongs to
> the
> > > > > current
> > > > > > c.
> > > > > >
> > > > > > Imagine that b although is a bag of all the records but when
> passed
> > > to
> > > > > the
> > > > > > count function in 'for each c', only those items/records are
> > filtered
> > > > or
> > > > > > counted which belong to the current c.
> > > > > >
> > > > > > Take a look at this link that I sent earlier (especially the
> > > age_counts
> > > > > > example):
> > > > > >
> > > >
> > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
> > > > > >
> > > > > > It does not explain everything but it is a more detailed example
> > with
> > > > > > comments and perhaps would help you to understand this Pig
> specific
> > > > > > concept.
> > > > > >
> > > > > > Regards,
> > > > > > Shahab
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 22, 2014 at 12:07 AM, Ashish Dobhal <
> > > > > dobhalashish...@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > In this case does the b refer to the tupples corresponding to a
> > > > single
> > > > > > > group. If so I still did not get the point because b is a bag
> > that
> > > > > > contains
> > > > > > > all the records and not only the records of a single group
> > > > > > >
> > > > > > > On Jul 21, 2014 8:33 PM, <william.dowl...@thomsonreuters.com>
> > > wrote:
> > > > > > > >
> > > > > > > > This was hard for me to get when I started using pig, and it
> > > still
> > > > > > annoys
> > > > > > > me after 1.5 year's experience with pig. In mathematics and
> > logic,
> > > > > > > quantifiers (like "for each", "there exist") bind variables
> that
> > > > occur
> > > > > in
> > > > > > > their scope:
> > > > > > > > (for each x)(there exists y) [y > x]
> > > > > > > >
> > > > > > > > The (for each x) binds x in (there exists y) [y > x]
> > > > > > > >
> > > > > > > > But in pig the variable x in (for each x) *does not bind
> > > > occurrences
> > > > > of
> > > > > > > x* in the following subexpression. IMO this is an unnecessary
> > > > stumbling
> > > > > > > block to people learning pig, who have a background in math or
> > > logic.
> > > > > > > >
> > > > > > > > Here is how you can read
> > > > > > > >         foreach c generate COUNT(b), group;
> > > > > > > > so it makes sense:
> > > > > > > >         c's components are "group" and (bag) b, so:
> > > > > > > >         foreach (group, b) in c generate COUNT(b), group;
> > > > > > > >
> > > > > > > > I would love it if the Pig syntax were extended to allow
> > > > quantifiers
> > > > > > like
> > > > > > >  "foreach (group, b) in c" but I don't know how feasible that
> > would
> > > > be.
> > > > > > > >
> > > > > > > > William F Dowling
> > > > > > > > Senior Technologist
> > > > > > > > Thomson Reuters
> > > > > > > >
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ashish Dobhal [mailto:dobhalashish...@gmail.com]
> > > > > > > > Sent: Monday, July 21, 2014 10:34 AM
> > > > > > > > To: user@pig.apache.org
> > > > > > > > Subject: Re: Problem in understanding UDF COUNT
> > > > > > > >
> > > > > > > > Shahab Thanks
> > > > > > > > My doubt is why are we taking the bag b and not  bag c as the
> > > > > arguement
> > > > > > > in the COUNT(b) function.
> > > > > > > > The bag c contains the groups and not hte bag b.
> > > > > > > > TThanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus <
> > > > > shahab.yu...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Have you seen this documentation and blog?
> > > > > > > > >
> > > > > >
> > > http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
> > > > > > > > > / http://pig.apache.org/docs/r0.9.2/func.html#count
> > > > > > > > >
> > > > > > > > > They explain this in detail.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Shahab
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal
> > > > > > > > > <dobhalashish...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > a = load '/user/hue/word_count_text.txt'; b = foreach a
> > > > generate
> > > > > > > > > > flatten(TOKENIZE((chararray)$0)) as word; c = group b by
> > > word;
> > > > d
> > > > > =
> > > > > > > > > > foreach c generate COUNT(b), group;
> > > > > > > > > >
> > > > > > > > > > I want to know what would be the input to the udf COUNT
> in
> > > this
> > > > > > > > > > case.Also what is the meaning of b being passed as an
> > > > arguement.
> > > > > > > > > >
> > > > > > > > > > Also I am still not clear acout how count operates.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > Ashish
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to