Two joins, followed by a full outer join of the results, and a selection
pass?
It's not pretty, but it'll work...
On Sun, Oct 17, 2010 at 5:03 PM, rakesh kothari wrote:
>
> What's the best way to do something like this in PIG:
>
> JOIN A with B where (A.property1 = B.property1 OR A.property2 =
>
What's the best way to do something like this in PIG:
JOIN A with B where (A.property1 = B.property1 OR A.property2 = B.property2) ?
Thanks,
-Rakesh
No on Filters (though every MR job tells you the number of records ingested,
and the number returned, and as of 0.8 it also tells you which relations
were being produced in the job -- so you can sort of back into that).
EB sort of gives you 2), most of the loaders in there give you number of
malfor
I've seen a few threads about counters, PigStats, Elephant-Bird's stats
utility class, etc.
http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg00900.html
http://www.mail-archive.com/user%40pig.apache.org/msg00034.html
Has any progress been made on this or to provide a comprehensive
stats/c
Glad it worked for you :)
I use the standard apache pig distributions.
There are several places that environment variables can be changed and set,
and I have no idea which one cloudera uses but here is a list:
/etc/profile.d/ (we have hadoop.sh, pig.sh and java.sh here that
sets the home variab