CROSS is by definition a very very expensive operation. Regardless, CROSS is the wrong operator for what you're trying to do.
As was suggested by others, you want to RANK the relations then do a JOIN by the rank. On Tue, Mar 25, 2014 at 1:27 PM, <william.dowl...@thomsonreuters.com> wrote: > Here is how to use rank and join for this problem: > > sh cat xxx > 1,2,3,4,5 > 1,2,4,5,7 > 1,5,7,8,9 > > sh cat yyy > 10,11 > 10,12 > 10,13 > > > a= load 'xxx' using PigStorage(','); > b= load 'yyy' using PigStorage(','); > > a2 = rank a; > b2 = rank b; > > c = join a1 by $0, b2 by $0; > c2 = order c by $6; > c3 = foreach c2 generate $1 .. $5, $7 ..; > > dump c3 > (1,2,3,4,5,10,11) > (1,2,4,5,7,10,12) > (1,5,7,8,9,10,13) > > > William F Dowling > Senior Technologist > Thomson Reuters > > > -----Original Message----- > From: Christopher Surage [mailto:csur...@gmail.com] > Sent: Tuesday, March 25, 2014 4:03 PM > To: user@pig.apache.org > Subject: Re: Any way to join two aliases without using CROSS > > The output I would like to see is > > (1,2,3,4,5,10,11) > (1,2,4,5,7,10,12) > (1,5,7,8,9,10,13) > > > On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeep...@gmail.com > >wrote: > > > I don't understand what you're trying to do from your example. > > > > If you perform a cross on the data you have, the output will be the > > following: > > > > (1,2,3,4,5,10,11) > > (1,2,3,4,5,10,11) > > (1,2,3,4,5,10,11) > > (1,2,4,5,7,10,11) > > (1,2,4,5,7,10,11) > > (1,2,4,5,7,10,11) > > (1,5,7,8,9,10,11) > > (1,5,7,8,9,10,11) > > (1,5,7,8,9,10,11) > > > > On this, you'll have to do a distinct to get what you're looking for. > > > > Let's change the example a little bit so we get a more clear > understanding > > of your problem. What would be the output if your two relations looked as > > follows: > > > > (1,2,3,4,5) (10,11) > > (1,2,4,5,7) (10,12) > > (1,5,7,8,9) (10,13) > > > > > > On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yu...@gmail.com > > >wrote: > > > > > Have you tried iterating over the first relation and in the nested > > > *generate* clause, always appending the second relation? Your top level > > > looping is on first relation but in the nested block you are sort of > > > hardcoding appending of second relation. > > > > > > I am referring to the examples like in "Example: Nested Blocks" > section > > > http://pig.apache.org/docs/r0.10.0/basic.html#foreach > > > > > > > > > On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csur...@gmail.com > > > >wrote: > > > > > > > I am trying to perform the following action, but the only solution I > > have > > > > been able to come up with is using a CROSS, but I don't want to use > > that > > > > statement as it is a very expensive process. > > > > > > > > (1,2,3,4,5) (10,11) > > > > (1,2,4,5,7) (10,11) > > > > (1,5,7,8,9) (10,11) > > > > > > > > > > > > I want to make it > > > > (1,2,3,4,5,10,11) > > > > (1,2,4,5,7,10,11) > > > > (1,5,7,8,9,10,11) > > > > > > > > any help would be much appreciated, > > > > > > > > Chris > > > > > > > > > >