Re: Any way to join two aliases without using CROSS

Andrew Musselman Tue, 25 Mar 2014 14:06:12 -0700

In that situation you could write a script that tacks on the equivalent value 
that rank does, and stream the ordered relations through it.


I'm assuming you have a sense of order on both these relations.

After that join like you would after rank.

I'm not at a computer so can't type up an example.

> On Mar 25, 2014, at 1:57 PM, Christopher Surage <csur...@gmail.com> wrote:
> 
> I don't think my version of PIG supports the rank function, I keep getting
> Internal Error. I would update it, but I am not in control of the cluster.
> 
> 
> On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> 
>> John's answer about RANK sounds like it should solve your problem
>> 
>>>> On Mar 25, 2014, at 1:13 PM, Christopher Surage <csur...@gmail.com>
>>> wrote:
>>> 
>>> @ pradeep, I know what the cross product will do, but I have many lines
>> in
>>> many files. So the cross will take far too long to complete.
>>> 
>>> 
>>> On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota <pradeep...@gmail.com
>>> wrote:
>>> 
>>>> I don't understand what you're trying to do from your example.
>>>> 
>>>> If you perform a cross on the data you have, the output will be the
>>>> following:
>>>> 
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,3,4,5,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,2,4,5,7,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> (1,5,7,8,9,10,11)
>>>> 
>>>> On this, you'll have to do a distinct to get what you're looking for.
>>>> 
>>>> Let's change the example a little bit so we get a more clear
>> understanding
>>>> of your problem. What would be the output if your two relations looked
>> as
>>>> follows:
>>>> 
>>>> (1,2,3,4,5)          (10,11)
>>>> (1,2,4,5,7)          (10,12)
>>>> (1,5,7,8,9)          (10,13)
>>>> 
>>>> 
>>>> On Tue, Mar 25, 2014 at 12:18 PM, Shahab Yunus <shahab.yu...@gmail.com
>>>>> wrote:
>>>> 
>>>>> Have you tried iterating over the first relation and in the nested
>>>>> *generate* clause, always appending the second relation? Your top level
>>>>> looping is on first relation but in the nested block you are sort of
>>>>> hardcoding appending of second relation.
>>>>> 
>>>>> I am referring to the examples like in  "Example: Nested Blocks"
>> section
>>>>> http://pig.apache.org/docs/r0.10.0/basic.html#foreach
>>>>> 
>>>>> 
>>>>> On Tue, Mar 25, 2014 at 3:01 PM, Christopher Surage <csur...@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> I am trying to perform the following action, but the only solution I
>>>> have
>>>>>> been able to come up with is using a CROSS, but I don't want to use
>>>> that
>>>>>> statement as it is a very expensive process.
>>>>>> 
>>>>>> (1,2,3,4,5)          (10,11)
>>>>>> (1,2,4,5,7)          (10,11)
>>>>>> (1,5,7,8,9)          (10,11)
>>>>>> 
>>>>>> 
>>>>>> I want to make it
>>>>>> (1,2,3,4,5,10,11)
>>>>>> (1,2,4,5,7,10,11)
>>>>>> (1,5,7,8,9,10,11)
>>>>>> 
>>>>>> any help would be much appreciated,
>>>>>> 
>>>>>> Chris
>>

Re: Any way to join two aliases without using CROSS

Reply via email to