I'm noticing some really strange behavior with a CROSS operation in one of
my scripts.

I'm CROSSing a table T1 with another table T2 to produce T3. T1 has one
row, and T2 has 2,982,035 rows.

If I STORE both T1 and T2 before CROSSing them together to get T3, like so:

-- ... Long script that, among other things, creates T1 and T2 ...
STORE T1 INTO 'hdfs://namenode/x/T1' USING PigStorage(',');
STORE T2 INTO 'hdfs://namenode/x/T2' USING PigStorage(',');
T3 = CROSS T2, T1;

then I get what I expect; T3 has 2,982,035 records.

However, if I omit the STOREs and run the CROSS directly, T3 only has 1,492,977
records.

I've run EXPLAIN on both the script with the STOREs and the script without,
and their query plans are identical.

I'm going to end up refactoring the script to get rid of the CROSS anyway
since it's expensive, but am curious as to whether I'm doing something
wrong or if there may be a subtle bug in CROSS.

I'm using Pig version 0.11.0-cdh4.5.0

Any insight you could give me here would be greatly appreciated.

Thanks,
--Alex

Reply via email to