RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Awesome -- I really appreciate that insight. Is that recorded anywhere? If not, then perhaps I'll spend some time writing about how these things are implemented in the wiki for when others come along with similar questions. Thanks, Alan! This e-mail is intended solely for the above-mentione

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Alan Gates
Many operators, such as join and group by, are not implemented by a single physical operation. Also, they are spread through the code as they have logical components and physical components. The logical components of join are in org.apache.pig.newplan.logical.relational.LOJoin.java. That gets

RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Thanks Russell -- That's really useful. Just for kicks and giggles: Where would I look in the code base to see how the JOIN keyword is implemented? I've found the built in functions, but not the keywords (JOIN, GROUP, etc). Perhaps that would give me some hints. Perhaps it'll show me that a

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Russell Jurney
You can write an EvalFunc UDF that depends on a sort, and there are several in piggybank that do so. COR (the correlate UDF) is such an example. You call these UDFs on a relation after ordering them. For example: answers = foreach (group data by key) { sorted = order data by value; generate m