I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will be the
following:
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank
Rank each data set then join on the rank.
On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage csur...@gmail.com wrote:
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014
yes
On Tue, Mar 25, 2014 at 4:07 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
Oh, sorry. This new example is something different from what I understood
before. I thought you were only trying to append one relation (with one
tuple) to another (which has more than one tuple).
So essentially
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I don't understand what you're trying to do from your example.
If you perform
John's answer about RANK sounds like it should solve your problem
On Mar 25, 2014, at 1:13 PM, Christopher Surage csur...@gmail.com wrote:
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.
On Tue, Mar
way to join two aliases without using CROSS
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
I don't understand what you're trying to do from your example.
If you perform
Subject: Re: Any way to join two aliases without using CROSS
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
I don't understand what you're trying to do from your
I don't think my version of PIG supports the rank function, I keep getting
Internal Error. I would update it, but I am not in control of the cluster.
On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman
andrew.mussel...@gmail.com wrote:
John's answer about RANK sounds like it should solve your
In that situation you could write a script that tacks on the equivalent value
that rank does, and stream the ordered relations through it.
I'm assuming you have a sense of order on both these relations.
After that join like you would after rank.
I'm not at a computer so can't type up an
Hello,
There is a similar UDF in DataFu named Enumerate.
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html
I wish it may help.
James
Unfortunately, the Enumerate UDF from DataFu would not work in this case.
The UDF works on Bags and in this case, we want to enumerate a relation.
Implementing RANK is a very tricky thing to do correctly. I'm not even sure
if it's doable just by using Pig operators, UDFs or macros. Best option is
12 matches
Mail list logo