"Invalid alias" from UDF output

2010-06-11 Thread Dave Viner
I am having a problem getting Pig 0.7.0 to use a variable I add from a UDF. Here's the basic pig script: LOGS = LOAD '$INPUT' USING PigStorage('\t') ; IMP_SID = FOREACH IMPRESSIONS_ONLY GENERATE *, (($4 == 'NULL') ? null : (chararray)$4) AS my_id:chararray; ORD_SID = FOREACH IMP_SID GENERAT

Re: Help with a tricky query

2010-06-11 Thread Dmitriy Ryaboy
hc, self-join should just work, but if it doesn't: split table into table_2 if 1==1, table3 if 1==1; OR table_2 = foreach table generate *; table3 = foreach table generate *; AND THEN T = join table by id1, table_2 by id2, table_3 by id3 -D On Fri, Jun 11, 2010 at 10:59 AM, hc busy wrote:

UDF: group schema

2010-06-11 Thread Syed Wasti
Hi, I have written a UDF to sort the grouped data on a given field (in my case date field) and return the sorted data in a databag. I want my method to get the schema of my fields within the input (which is in a bag) and returning bag should carry this schema. In the outputSchema method the input s

Re: Help with a tricky query

2010-06-11 Thread hc busy
Yeah, that IS hard in pig. I'm not even sure how to do a self-join in Pig. Like you can't really say T = join Table by id1, Table by id2, Table by id3; I think PigLatin will complain that it's confused which Table is and which id1 goes with which table. I had been proposing that we allow PigLati

Re: Behavior of JOIN

2010-06-11 Thread hc busy
Oh, I see what my confusion is... It's the "null"s on which join behaves differently in pig than sql. Right? that's where things are different. On Thu, Jun 10, 2010 at 12:48 PM, Alan Gates wrote: > That's already what happens, because flattening a bag that is empty results > in 0 rows, regardle