Re: Adding keywords to Pig Latin (Was: Question about UDFs and tuple ordering)

2012-10-09 Thread Gianmarco De Francisci Morales
Hi, one grammar is responsible for lexing (tokenization), the second one for parsing and the last one for generating the plan from a tree grammar (the AST output of the parsing). These are the main ones, but there are also some "minor" ones like AST validator or AST printer. If you had to add a ke

Adding keywords to Pig Latin (Was: Question about UDFs and tuple ordering)

2012-10-09 Thread Brian Stempin
I've taken some time to understand how a Logical Plan progresses to a Physical and MR Plan (thanks for the boost, Alan!) My next question is centered around Logical Plan generation. If one were to add a new keyword (sticking with the theme in my last message, say, SUPERSPECIALJOIN), that keywo

RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Awesome -- I really appreciate that insight. Is that recorded anywhere? If not, then perhaps I'll spend some time writing about how these things are implemented in the wiki for when others come along with similar questions. Thanks, Alan! This e-mail is intended solely for the above-mentione

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Alan Gates
Many operators, such as join and group by, are not implemented by a single physical operation. Also, they are spread through the code as they have logical components and physical components. The logical components of join are in org.apache.pig.newplan.logical.relational.LOJoin.java. That gets

RE: Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Thanks Russell -- That's really useful. Just for kicks and giggles: Where would I look in the code base to see how the JOIN keyword is implemented? I've found the built in functions, but not the keywords (JOIN, GROUP, etc). Perhaps that would give me some hints. Perhaps it'll show me that a

Re: Question about UDFs and tuple ordering

2012-10-05 Thread Russell Jurney
You can write an EvalFunc UDF that depends on a sort, and there are several in piggybank that do so. COR (the correlate UDF) is such an example. You call these UDFs on a relation after ordering them. For example: answers = foreach (group data by key) { sorted = order data by value; generate m

Question about UDFs and tuple ordering

2012-10-05 Thread Brian Stempin
Hi, I'm fairly new to writing UDFs and Pig in general. I want to be able to write a UDF that can take advantage of MapReduce's sorting of data. Specifically, I'm trying to conceive how I'd write a UDF to do a specialized join or a pivot. In both cases, sorting would be useful. EvalFunc seems