I am having a problem getting Pig 0.7.0 to use a variable I add from a UDF.
Here's the basic pig script:
LOGS = LOAD '$INPUT' USING PigStorage('\t') ;
IMP_SID = FOREACH IMPRESSIONS_ONLY GENERATE *,
(($4 == 'NULL') ? null : (chararray)$4) AS my_id:chararray;
ORD_SID = FOREACH IMP_SID GENERAT
hc, self-join should just work, but if it doesn't:
split table into table_2 if 1==1, table3 if 1==1;
OR
table_2 = foreach table generate *;
table3 = foreach table generate *;
AND THEN
T = join table by id1, table_2 by id2, table_3 by id3
-D
On Fri, Jun 11, 2010 at 10:59 AM, hc busy wrote:
Hi,
I have written a UDF to sort the grouped data on a given field (in my case
date field) and return the sorted data in a databag. I want my method to get
the schema of my fields within the input (which is in a bag) and returning
bag should carry this schema.
In the outputSchema method the input s
Yeah, that IS hard in pig. I'm not even sure how to do a self-join in Pig.
Like you can't really say
T = join Table by id1, Table by id2, Table by id3;
I think PigLatin will complain that it's confused which Table is and which
id1 goes with which table.
I had been proposing that we allow PigLati
Oh, I see what my confusion is... It's the "null"s on which join behaves
differently in pig than sql. Right? that's where things are different.
On Thu, Jun 10, 2010 at 12:48 PM, Alan Gates wrote:
> That's already what happens, because flattening a bag that is empty results
> in 0 rows, regardle