Awesome -- I really appreciate that insight. Is that recorded anywhere? If
not, then perhaps I'll spend some time writing about how these things are
implemented in the wiki for when others come along with similar questions.
Thanks, Alan!
This e-mail is intended solely for the above-mentione
Many operators, such as join and group by, are not implemented by a single
physical operation. Also, they are spread through the code as they have
logical components and physical components. The logical components of join are
in org.apache.pig.newplan.logical.relational.LOJoin.java. That gets
Thanks Russell -- That's really useful.
Just for kicks and giggles: Where would I look in the code base to see how the
JOIN keyword is implemented? I've found the built in functions, but not the
keywords (JOIN, GROUP, etc). Perhaps that would give me some hints. Perhaps
it'll show me that a
You can write an EvalFunc UDF that depends on a sort, and there are
several in piggybank that do so. COR (the correlate UDF) is such an
example. You call these UDFs on a relation after ordering them.
For example:
answers = foreach (group data by key)
{
sorted = order data by value;
generate m