Thanks. I'm not familiar with python, but I write bunch of UDFs in java.
One question though, how do I pass the the entire tuple to the UDF, I mean I
can't do something like this:
B = FOREACH A GENERATE myudf(A)
Essentially what I want is given a tuple, I want to enrich the tuple to add
one
Hi Dexin,
This is the sort of thing I've started using Python UDFs for. See:
http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of
how to write the python code.
If your udf was implemented in Python you could then do this...
register 'udfs.py' using jython as udf;
...
B = FOREACH
Thanks guys. Updated PIG-2187 with a new patch.
On Fri, Jul 22, 2011 at 3:44 PM, Daniel Dai wrote:
> Yes, I am talking about PigTextOutputFormat.
>
> On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi wrote:
>
> > On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai
> wrote:
> >
> > > I mean StoreFunc that
Possible to do conditional and more than one generate inside a foreach?
for example, I have tuples like this (names, days_ago)
(a,0)
(b,1)
(c,9)
(d,40)
b shows up 1 day ago, so it belongs to all of the following: yesterday, last
week, last month, and last quarter. So I'd like to turn the above t
Yes, it is strongly recommended to use 0.8.1, which we fixed quite a few
important bugs.
Daniel
On Fri, Jul 22, 2011 at 6:30 AM, Andrew Clegg wrote:
> Hello again,
>
> I have a relation with the following schema:
>
> regrouped: {group: (artistid: int,country: int,week:
> chararray),projected_joi
Yes, I am talking about PigTextOutputFormat.
On Fri, Jul 22, 2011 at 2:51 PM, Raghu Angadi wrote:
> On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai wrote:
>
> > I mean StoreFunc that delegate outputformat to PigOutputFormat.
>
>
>
>
> > Though
> > PigOutputFormat is not in package org.apache.pig, i
makes sense. I will attach an updated patch that move Tuple serialization to
StorageUtil.
since we expect uses to extend PigStorage, I would like to add
getFieldDelmiter() method.. otherwise the extender has to parse and
remember.
Raghu.
On Fri, Jul 22, 2011 at 3:10 PM, Alan Gates wrote:
> "Th
"There are very few StoreFuncs that extend PigStorage" that we know of. We
don't know how our users are extending it for themselves. And PigStorage is a
public interface. Breaking it is a non-starter.
Alan.
On Jul 22, 2011, at 2:57 PM, Raghu Angadi wrote:
> Yes, I don't like the extra copie
Yes, I don't like the extra copies either.. thats why didn't mark the Jira
'patch available'. A static helper method would also be useful.
But I don't see how it breaks how it breaks existing StoreFuncs or output
formats.. is there an example? There are very few StoreFuncs that extend
PigStorage.
On Fri, Jul 22, 2011 at 1:29 PM, Daniel Dai wrote:
> I mean StoreFunc that delegate outputformat to PigOutputFormat.
> Though
> PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of
> PigStorage,
There is no reference to PigOutputFormat in PigStorage. Did you mean
PigT
At this point I'm -1 on this. I don't want to break existing output formats or
store functions. And I don't see that much value here. You can accomplish the
same thing by putting the logic in a static method of PigTextOutputFormat and
letting other users use it. Also, the cost of an extra co
I mean StoreFunc that delegate outputformat to PigOutputFormat. Though
PigOutputFormat is not in package org.apache.pig, it is the OutputFormat of
PigStorage, which many users will use as reference implementation for a
StoreFunc.
Daniel
On Fri, Jul 22, 2011 at 12:24 PM, Raghu Angadi wrote:
> at
attached a patch to https://issues.apache.org/jira/browse/PIG-2187
Only drawback is extra copies required to make a Text().
On Thu, Jul 21, 2011 at 1:21 PM, Daniel Dai wrote:
> I agree tuple -> text conversion better be in StoreFunc. User may have
> better chance to reuse OutputFormat.
>
> Fo
Hello again,
I have a relation with the following schema:
regrouped: {group: (artistid: int,country: int,week:
chararray),projected_joined_albums: {key: (artistid: int,country:
int,week: chararray),timestamp: long,albumid: int,numtracks:
long,reach: int,title_len: long}}
having grouped the proje
Dmitriy -- my requirements have changed slightly in this particular
instance, I actually now need to order by several columns, so I think
that means I have to use an inner order-by, rather than TOP.
Thankfully the bags are small.
Daniel -- I'm working on extracting out a small test case that
demon
On the subject of TOP -- the reason you would use it instead of an inner
order + limit is that it's much more efficient for large bags.
It is algebraic, so the computation can be well optimized. On top of that,
it does not require a full sort of the bag.
-D
On Thu, Jul 21, 2011 at 9:41 PM, Daniel
16 matches
Mail list logo