Hi there, I'm doing a join like this:
A = LOAD '/data/sessions' USING PigStorage(',') AS (userid:chararray, client_type:chararray, flag:long); A1 = GROUP bettyy_sessions ALL; A1 = FOREACH A1 GENERATE COUNT(A); DUMP A1 (543872) B = LOAD '/data/userdb' USING PigStorage(',') AS (uid:chararray, birth_year:int); A = JOIN A by userid, B by uid; A1 = GROUP bettyy_sessions ALL; A1 = FOREACH A1 GENERATE COUNT(A); DUMP A1 (1079122) Now the dataset has more rows than before the join which is basically the opposite of what I'm expecting as not all userids on A do have a uid on the B dataset. Does anyone of you do have a hint what the problem here is? Thanks, -Marco