Re: Joins with OR condition

2010-10-17 Thread Dmitriy Ryaboy
Two joins, followed by a full outer join of the results, and a selection pass? It's not pretty, but it'll work... On Sun, Oct 17, 2010 at 5:03 PM, rakesh kothari wrote: > > What's the best way to do something like this in PIG: > > JOIN A with B where (A.property1 = B.property1 OR A.property2 = >

Joins with OR condition

2010-10-17 Thread rakesh kothari
What's the best way to do something like this in PIG: JOIN A with B where (A.property1 = B.property1 OR A.property2 = B.property2) ? Thanks, -Rakesh

Re: Built-in counters

2010-10-17 Thread Dmitriy Ryaboy
No on Filters (though every MR job tells you the number of records ingested, and the number returned, and as of 0.8 it also tells you which relations were being produced in the job -- so you can sort of back into that). EB sort of gives you 2), most of the loaders in there give you number of malfor

Built-in counters

2010-10-17 Thread Josh Devins
I've seen a few threads about counters, PigStats, Elephant-Bird's stats utility class, etc. http://www.mail-archive.com/pig-u...@hadoop.apache.org/msg00900.html http://www.mail-archive.com/user%40pig.apache.org/msg00034.html Has any progress been made on this or to provide a comprehensive stats/c

RE: accessing remote cluster with Pig

2010-10-17 Thread Gerrit Jansen van Vuuren
Glad it worked for you :) I use the standard apache pig distributions. There are several places that environment variables can be changed and set, and I have no idea which one cloudera uses but here is a list: /etc/profile.d/ (we have hadoop.sh, pig.sh and java.sh here that sets the home variab