Re: conditional and multiple generate inside foreach?

2011-07-22 Thread Dexin Wang
Thanks. I'm not familiar with python, but I write bunch of UDFs in java. One question though, how do I pass the the entire tuple to the UDF, I mean I can't do something like this: B = FOREACH A GENERATE myudf(A) Essentially what I want is given a tuple, I want to enrich the tuple to add one

Re: conditional and multiple generate inside foreach?

2011-07-22 Thread Scott Foster
Hi Dexin, This is the sort of thing I've started using Python UDFs for. See: http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of how to write the python code. If your udf was implemented in Python you could then do this... register 'udfs.py' using jython as udf; ... B = FOREACH

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Raghu Angadi
Thanks guys. Updated PIG-2187 with a new patch. On Fri, Jul 22, 2011 at 3:44 PM, Daniel Dai wrote: > Yes, I am talking about PigTextOutputFormat. > > On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi wrote: > > > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai > wrote: > > > > > I mean StoreFunc that

conditional and multiple generate inside foreach?

2011-07-22 Thread Dexin Wang
Possible to do conditional and more than one generate inside a foreach? for example, I have tuples like this (names, days_ago) (a,0) (b,1) (c,9) (d,40) b shows up 1 day ago, so it belongs to all of the following: yesterday, last week, last month, and last quarter. So I'd like to turn the above t

Re: Schema changes when storing to a file

2011-07-22 Thread Daniel Dai
Yes, it is strongly recommended to use 0.8.1, which we fixed quite a few important bugs. Daniel On Fri, Jul 22, 2011 at 6:30 AM, Andrew Clegg wrote: > Hello again, > > I have a relation with the following schema: > > regrouped: {group: (artistid: int,country: int,week: > chararray),projected_joi

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Daniel Dai
Yes, I am talking about PigTextOutputFormat. On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi wrote: > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai wrote: > > > I mean StoreFunc that delegate outputformat to PigOutputFormat. > > > > > > Though > > PigOutputFormat is not in package org.apache.pig, i

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Raghu Angadi
makes sense. I will attach an updated patch that move Tuple serialization to StorageUtil. since we expect uses to extend PigStorage, I would like to add getFieldDelmiter() method.. otherwise the extender has to parse and remember. Raghu. On Fri, Jul 22, 2011 at 3:10 PM, Alan Gates wrote: > "Th

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Alan Gates
"There are very few StoreFuncs that extend PigStorage" that we know of. We don't know how our users are extending it for themselves. And PigStorage is a public interface. Breaking it is a non-starter. Alan. On Jul 22, 2011, at 2:57 PM, Raghu Angadi wrote: > Yes, I don't like the extra copie

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Raghu Angadi
Yes, I don't like the extra copies either.. thats why didn't mark the Jira 'patch available'. A static helper method would also be useful. But I don't see how it breaks how it breaks existing StoreFuncs or output formats.. is there an example? There are very few StoreFuncs that extend PigStorage.

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Raghu Angadi
On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai wrote: > I mean StoreFunc that delegate outputformat to PigOutputFormat. > Though > PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of > PigStorage, There is no reference to PigOutputFormat in PigStorage. Did you mean PigT

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Alan Gates
At this point I'm -1 on this. I don't want to break existing output formats or store functions. And I don't see that much value here. You can accomplish the same thing by putting the logic in a static method of PigTextOutputFormat and letting other users use it. Also, the cost of an extra co

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Daniel Dai
I mean StoreFunc that delegate outputformat to PigOutputFormat. Though PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of PigStorage, which many users will use as reference implementation for a StoreFunc. Daniel On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi wrote: > at

Re: PigStorage's handling of InputFormat and OutputFormat

2011-07-22 Thread Raghu Angadi
attached a patch to https://issues.apache.org/jira/browse/PIG-2187 Only drawback is extra copies required to make a Text(). On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai wrote: > I agree tuple -> text conversion better be in StoreFunc. User may have > better chance to reuse OutputFormat. > > Fo

Schema changes when storing to a file

2011-07-22 Thread Andrew Clegg
Hello again, I have a relation with the following schema: regrouped: {group: (artistid: int,country: int,week: chararray),projected_joined_albums: {key: (artistid: int,country: int,week: chararray),timestamp: long,albumid: int,numtracks: long,reach: int,title_len: long}} having grouped the proje

Re: Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-22 Thread Andrew Clegg
Dmitriy -- my requirements have changed slightly in this particular instance, I actually now need to order by several columns, so I think that means I have to use an inner order-by, rather than TOP. Thankfully the bags are small. Daniel -- I'm working on extracting out a small test case that demon

Re: Confused by FOREACH .. GENERATE .. TOP semantics

2011-07-22 Thread Dmitriy Ryaboy
On the subject of TOP -- the reason you would use it instead of an inner order + limit is that it's much more efficient for large bags. It is algebraic, so the computation can be well optimized. On top of that, it does not require a full sort of the bag. -D On Thu, Jul 21, 2011 at 9:41 PM, Daniel