Hi Jon,
I think an "INVERSE_MAP" would be a good use case. Also, instead of (or in
addition to) KEYSET we should have KEYLIST which does not eliminate
duplicate values.
I would like to help on this if needed, please let me know if you have a
JIRA against this.
Thanks,
Prashant
On Wed, Feb 29,
Thejas, I don't think nested foreaches are in 8. They are only in trunk iirc.
On Thu, Mar 15, 2012 at 3:46 PM, Thejas Nair wrote:
> On 3/13/12 9:02 PM, guoyun wrote:
>
>>>
>>>
>>> You need to use another nested foreach statement. -
>>>
>>> C = foreach B { B1BAG = foreach A generate b.b1; D = di
If you want to run it under a debugger, you can run it in local mode.
java -cp pig.jar org.apache.pig.Main -x local
-Thejas
On 3/12/12 4:50 AM, chethan wrote:
Hi,
Can write UDF with overrides LOAD SimpleTextLoader without mapreduce, I am
bit confused with the use of mapreduce, because i am
Hi Yen,
Does the function also implement Algebraic ? In that case it might end
up using the algebraic interface of the udf.
If your foreach statement has functions that don't implement Accumulator
interface, then reduce task won't run in accumulative mode. This is
because you are anyway going t
I don't know if you can do a filterfunc per se, but a hack would be to
return an int, and do 1 if true and 0 otherwise, and filter by
yourudf(input)==1
2012/3/15 Marco Cova
> Hi all.
>
> I'm trying to write a simple filter function (to be used with the FILTER
> operator) in python, but I don't s
Hi all.
I'm trying to write a simple filter function (to be used with the FILTER
operator) in python, but I don't seem to find the right way to specify its
schema. I'm using pig 0.9.2.
The filter's code is (trivially):
def trivial_filter(s):
return True
What's the right
On 3/13/12 9:02 PM, guoyun wrote:
You need to use another nested foreach statement. -
C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
generate flatten(group), COUNT(D);}
-Thejas
Thanks,but it is not support pig 0.8.0?
It should work in 0.8. Do you get some erro
Hey all,
I have the following FOREACH with nested block:
```
node_in = FOREACH (GROUP edge BY destination_id) {
in_degree = COUNT(edge);
in_edges = edge.(source_id, weight);
in_edges_sorted = ORDER in_edges BY weight DESC;
in_indices = SomeUDF(in_edges_sorted.source_id);
in_weights = An
Hey folks,
Maybe this isn't the best place for this question, but I'm thinking
maybe someone here ran into something similar, so I'll try anyway. I'm
currently trying to run an embedded pig script and then pass my results
on to a separate module that uses matplotlib for some graphing. Problem