Re: Better map support?

2012-03-15 Thread Prashant Kommireddi
Hi Jon, I think an "INVERSE_MAP" would be a good use case. Also, instead of (or in addition to) KEYSET we should have KEYLIST which does not eliminate duplicate values. I would like to help on this if needed, please let me know if you have a JIRA against this. Thanks, Prashant On Wed, Feb 29,

Re: about distinct

2012-03-15 Thread Dmitriy Ryaboy
Thejas, I don't think nested foreaches are in 8. They are only in trunk iirc. On Thu, Mar 15, 2012 at 3:46 PM, Thejas Nair wrote: > On 3/13/12 9:02 PM, guoyun wrote: > >>> >>> >>> You need to use another nested foreach statement. - >>> >>>   C = foreach B { B1BAG = foreach A generate b.b1; D = di

Re: UDF for LOAD SimpleTextLoader without mapreduce.

2012-03-15 Thread Thejas Nair
If you want to run it under a debugger, you can run it in local mode. java -cp pig.jar org.apache.pig.Main -x local -Thejas On 3/12/12 4:50 AM, chethan wrote: Hi, Can write UDF with overrides LOAD SimpleTextLoader without mapreduce, I am bit confused with the use of mapreduce, because i am

Re: Accumulator is not fired

2012-03-15 Thread Thejas Nair
Hi Yen, Does the function also implement Algebraic ? In that case it might end up using the algebraic interface of the udf. If your foreach statement has functions that don't implement Accumulator interface, then reduce task won't run in accumulative mode. This is because you are anyway going t

Re: python filter udfs

2012-03-15 Thread Jonathan Coveney
I don't know if you can do a filterfunc per se, but a hack would be to return an int, and do 1 if true and 0 otherwise, and filter by yourudf(input)==1 2012/3/15 Marco Cova > Hi all. > > I'm trying to write a simple filter function (to be used with the FILTER > operator) in python, but I don't s

python filter udfs

2012-03-15 Thread Marco Cova
Hi all. I'm trying to write a simple filter function (to be used with the FILTER operator) in python, but I don't seem to find the right way to specify its schema. I'm using pig 0.9.2. The filter's code is (trivially): def trivial_filter(s): return True What's the right

Re: about distinct

2012-03-15 Thread Thejas Nair
On 3/13/12 9:02 PM, guoyun wrote: You need to use another nested foreach statement. - C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG; generate flatten(group), COUNT(D);} -Thejas Thanks,but it is not support pig 0.8.0? It should work in 0.8. Do you get some erro

FOREACH nested block aliases and output schema field names

2012-03-15 Thread Andy Schlaikjer
Hey all, I have the following FOREACH with nested block: ``` node_in = FOREACH (GROUP edge BY destination_id) { in_degree = COUNT(edge); in_edges = edge.(source_id, weight); in_edges_sorted = ORDER in_edges BY weight DESC; in_indices = SomeUDF(in_edges_sorted.source_id); in_weights = An

Embedded Pig and MatPlotLib

2012-03-15 Thread Eli Finkelshteyn
Hey folks, Maybe this isn't the best place for this question, but I'm thinking maybe someone here ran into something similar, so I'll try anyway. I'm currently trying to run an embedded pig script and then pass my results on to a separate module that uses matplotlib for some graphing. Problem