Hey all,

I have one file A with a 'day' column like "2011/3/2"  and another B with a
column 'timestamp' like "2011/3/2 12:32"  ...  I want to join on these two
field in these records.
I do something like this:

A_and_B = JOIN A by (tracking_id, day) LEFT OUTER,
               B by (tracking_id,  STRSPLIT(timestamp, ' ', 1).$0)

where you can see I am projecting out the first element of the tuple
returned by strsplit...

When I run this I get an error of the form:
    org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: HASH_JOIN
    ERROR 2042: Error in new logical plan. Try
-Dpig.usenewlogicalplan=false.
Putting the environment variable before the "-x local" I see that the join
appears to be working. Yay.

I am happy that thing seem to be working, though I would appreciate some
feedback from those in the know as to why the environment variable fixes
this and if there is a more canonical way of doing this join.

thanks,
daniel

Reply via email to