Re: conditional and multiple generate inside foreach?

Xiaomeng Wan Mon, 25 Jul 2011 09:26:09 -0700

maybe you can try something like this:

B = foreach A generate name,days_ago, FLATTEN(((days_ago ==
1)?{('yesterday','week','month','quarter')}:((...)?:));


Shawn

On Sat, Jul 23, 2011 at 7:44 PM, Raghu Angadi <rang...@apache.org> wrote:
> I see 3 independent questions :
>
>  1. How can we pass entire row tuple to an UDF as 'B = FOREACH A GENERATE
> myudf(A)', without knowing schema? I don't know if that is passible. It does
> feel like it should be possible.
>
>  2. How can I return an augmented Tuple? Your UDF can make a copy of the
> input tuple and add whatever you like to and return it.. may be your
> question is not this simple.
>
>  3. How can I make UDF result in multiple row for for input row  as in your
> example:
>       - your UDF needs to return bag of row tuples. For (b,1) it would
> return {(b,1,yesterday), (b,1,week), ... }
>       - your pig script would flatten the output of the UDF :
>         B = foreach A generate FLATTEN( myUDF(name, days_ago) );
>
> Raghu.
>
> On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <wangde...@gmail.com> wrote:
>
>> Thanks. I'm not familiar with python, but I write bunch of UDFs in java.
>>
>> One question though, how do I pass the the entire tuple to the UDF, I mean
>> I
>> can't do something like this:
>>
>>    B = FOREACH A GENERATE myudf(A)
>>
>> Essentially what I want is given a tuple, I want to enrich the tuple to add
>> one more field to it, and the value of the new field depends on the value
>> in
>> some existing fields in the tuple.
>>
>> (a,1) -> (a,1,yesterday)
>>
>> how would I do that?
>>
>> I imagine I can do
>>   B = GROUP A BY random;
>>   C = FOREACH B GENERATE myudf(A);
>>
>> But I really don't like adding another GROUP BY here.
>>
>> On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <scottf.con...@gmail.com
>> >wrote:
>>
>> > Hi Dexin,
>> > This is the sort of thing I've started using Python UDFs for. See:
>> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of
>> > how to write the python code.
>> >
>> > If your udf was implemented in Python you could then do this...
>> >
>> > register 'udfs.py' using jython as udf;
>> > ...
>> > B = FOREACH A generate name, udf.daysAgoString(days_ago);
>> >
>> > scott.
>> >
>> > On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <wangde...@gmail.com> wrote:
>> > > Possible to do conditional and more than one generate inside a foreach?
>> > >
>> > > for example, I have tuples like this (names, days_ago)
>> > >
>> > > (a,0)
>> > > (b,1)
>> > > (c,9)
>> > > (d,40)
>> > >
>> > > b shows up 1 day ago, so it belongs to all of the following: yesterday,
>> > last
>> > > week, last month, and last quarter. So I'd like to turn the above to:
>> > >
>> > > (a,0,today)
>> > > (b,1,yesterday)
>> > > (b,1,week)
>> > > (b,1,month)
>> > > (b,1,quarter)
>> > > (c,9,month)
>> > > (c,9,quarter)
>> > > (d,40,quarter)
>> > >
>> > > I imagine/dream I could do something like this
>> > >
>> > > B = FOREACH A
>> > >  {
>> > >        if (days_ago <= 90) generate name,days_ago,'quarter';
>> > >        if (days_ago <= 30) generate name,days_ago,'month';
>> > >        if (days_ago <= 7)   generate name,days_ago,'week';
>> > >        if (days_ago == 1)   generate name,days_ago,'yesterday';
>> > >        if (days_ago == 0)   generate name,days_ago,'today';
>> > >  }
>> > >
>> > > of course that's not valid syntax. I could write my own UDF but would
>> be
>> > > nice there's some way to get what I want without UDF.
>> > >
>> > > Thanks!
>> > > Dexin
>> > >
>> >
>>
>

Re: conditional and multiple generate inside foreach?

Reply via email to