This looks like a bug. Can you please file a jira with steps to reproduce?
On Fri, Apr 18, 2014 at 2:45 PM, Alex Rasmussen <alex...@trifacta.com>wrote: > I'm using PigStorage(',') for all stores. > > I agree about the expensiveness of CROSS, but I'm still kind of confused as > to why it would lose records in this case. > > --Alex > > > On Fri, Apr 18, 2014 at 2:28 PM, Pradeep Gollakota <pradeep...@gmail.com > >wrote: > > > What is the storage func you're using? My guess is that there is some > > shared state in the Storage func. Take a look at this SO that is dealing > > with shared state in Stores. > > > > > http://stackoverflow.com/questions/20225842/apache-pig-append-one-dataset-to-another-one/20235592#20235592 > > . > > The reason why this doesn't occur is because PigStorage doesn't have > shared > > state. So, in T3, you're loading from text files instead of your original > > store func. > > > > CROSS is pretty expensive by nature. If one of your datasets is small > > enough to load into memory, you use a fragment replicate join instead. > > > > > > On Fri, Apr 18, 2014 at 11:43 AM, Alex Rasmussen <alex...@trifacta.com > > >wrote: > > > > > I'm noticing some really strange behavior with a CROSS operation in one > > of > > > my scripts. > > > > > > I'm CROSSing a table T1 with another table T2 to produce T3. T1 has one > > > row, and T2 has 2,982,035 rows. > > > > > > If I STORE both T1 and T2 before CROSSing them together to get T3, like > > so: > > > > > > -- ... Long script that, among other things, creates T1 and T2 ... > > > STORE T1 INTO 'hdfs://namenode/x/T1' USING PigStorage(','); > > > STORE T2 INTO 'hdfs://namenode/x/T2' USING PigStorage(','); > > > T3 = CROSS T2, T1; > > > > > > then I get what I expect; T3 has 2,982,035 records. > > > > > > However, if I omit the STOREs and run the CROSS directly, T3 only has > > > 1,492,977 > > > records. > > > > > > I've run EXPLAIN on both the script with the STOREs and the script > > without, > > > and their query plans are identical. > > > > > > I'm going to end up refactoring the script to get rid of the CROSS > anyway > > > since it's expensive, but am curious as to whether I'm doing something > > > wrong or if there may be a subtle bug in CROSS. > > > > > > I'm using Pig version 0.11.0-cdh4.5.0 > > > > > > Any insight you could give me here would be greatly appreciated. > > > > > > Thanks, > > > --Alex > > > > > >