maybe you can try something like this: B = foreach A generate name,days_ago, FLATTEN(((days_ago == 1)?{('yesterday','week','month','quarter')}:((...)?:));
Shawn On Sat, Jul 23, 2011 at 7:44 PM, Raghu Angadi <rang...@apache.org> wrote: > I see 3 independent questions : > > 1. How can we pass entire row tuple to an UDF as 'B = FOREACH A GENERATE > myudf(A)', without knowing schema? I don't know if that is passible. It does > feel like it should be possible. > > 2. How can I return an augmented Tuple? Your UDF can make a copy of the > input tuple and add whatever you like to and return it.. may be your > question is not this simple. > > 3. How can I make UDF result in multiple row for for input row as in your > example: > - your UDF needs to return bag of row tuples. For (b,1) it would > return {(b,1,yesterday), (b,1,week), ... } > - your pig script would flatten the output of the UDF : > B = foreach A generate FLATTEN( myUDF(name, days_ago) ); > > Raghu. > > On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <wangde...@gmail.com> wrote: > >> Thanks. I'm not familiar with python, but I write bunch of UDFs in java. >> >> One question though, how do I pass the the entire tuple to the UDF, I mean >> I >> can't do something like this: >> >> B = FOREACH A GENERATE myudf(A) >> >> Essentially what I want is given a tuple, I want to enrich the tuple to add >> one more field to it, and the value of the new field depends on the value >> in >> some existing fields in the tuple. >> >> (a,1) -> (a,1,yesterday) >> >> how would I do that? >> >> I imagine I can do >> B = GROUP A BY random; >> C = FOREACH B GENERATE myudf(A); >> >> But I really don't like adding another GROUP BY here. >> >> On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <scottf.con...@gmail.com >> >wrote: >> >> > Hi Dexin, >> > This is the sort of thing I've started using Python UDFs for. See: >> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of >> > how to write the python code. >> > >> > If your udf was implemented in Python you could then do this... >> > >> > register 'udfs.py' using jython as udf; >> > ... >> > B = FOREACH A generate name, udf.daysAgoString(days_ago); >> > >> > scott. >> > >> > On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <wangde...@gmail.com> wrote: >> > > Possible to do conditional and more than one generate inside a foreach? >> > > >> > > for example, I have tuples like this (names, days_ago) >> > > >> > > (a,0) >> > > (b,1) >> > > (c,9) >> > > (d,40) >> > > >> > > b shows up 1 day ago, so it belongs to all of the following: yesterday, >> > last >> > > week, last month, and last quarter. So I'd like to turn the above to: >> > > >> > > (a,0,today) >> > > (b,1,yesterday) >> > > (b,1,week) >> > > (b,1,month) >> > > (b,1,quarter) >> > > (c,9,month) >> > > (c,9,quarter) >> > > (d,40,quarter) >> > > >> > > I imagine/dream I could do something like this >> > > >> > > B = FOREACH A >> > > { >> > > if (days_ago <= 90) generate name,days_ago,'quarter'; >> > > if (days_ago <= 30) generate name,days_ago,'month'; >> > > if (days_ago <= 7) generate name,days_ago,'week'; >> > > if (days_ago == 1) generate name,days_ago,'yesterday'; >> > > if (days_ago == 0) generate name,days_ago,'today'; >> > > } >> > > >> > > of course that's not valid syntax. I could write my own UDF but would >> be >> > > nice there's some way to get what I want without UDF. >> > > >> > > Thanks! >> > > Dexin >> > > >> > >> >