Re: Behavior of JOIN

2010-06-11 Thread hc busy
Oh, I see what my confusion is... It's the "null"s on which join behaves differently in pig than sql. Right? that's where things are different. On Thu, Jun 10, 2010 at 12:48 PM, Alan Gates wrote: > That's already what happens, because flattening a bag that is empty results > in 0 rows, regardle

Re: Behavior of JOIN

2010-06-10 Thread Alan Gates
That's already what happens, because flattening a bag that is empty results in 0 rows, regardless of how many rows came out of the other bag. Alan. On Jun 10, 2010, at 11:09 AM, hc busy wrote: Isn't that kind of annoying? Since JOIN in sql implicitly is an inner join. Would have been gre

Re: Behavior of JOIN

2010-06-10 Thread hc busy
Isn't that kind of annoying? Since JOIN in sql implicitly is an inner join. Would have been great if C = JOIN A by id, B b id; is alias for C1 = COGROUP A by id, B by id; C2 = filter C1 by IsEmpty(A) OR IsEmpty(B); C = foreach C2 generate FLATTEN(A), FLATTEN(B); On Tue, Jun 8, 2010 at 12:03 PM,

Re: Behavior of JOIN

2010-06-08 Thread Alan Gates
Historically C = JOIN A by a, B by a was defined in Pig Latin as shorthand for: C1 = COGROUP A by a, B by a; C = FOREACH C1 GENERATE flatten(A), flatten(B) which produces the doubling of keys. Also, given that Pig Latin does not require that key names be the same (as USING or NATURAL do in

Re: Behavior of JOIN

2010-06-08 Thread Syed Wasti
Curious to know the answer too. To add more to this duplicate columns, after the join when I do the FOREACH for projection it errors out if the join condition fields have the same name, pig doesn't know which field to pick. Eg. C = JOIN A BY (var1), B BY (var1); D = FOREACH C GENERATE var1,

Behavior of JOIN

2010-06-08 Thread Alexander Schätzle
Hi all, the JOIN operator of Pig produces duplicate columns in its output. Let's say the statement is like this: C = JOIN A BY (var1, var2), B BY (var1, var2); Then C contains var1 and var2 two times (one for each input relation), of course with the same content. This is somehow not what a user