Unfortunately, the Enumerate UDF from DataFu would not work in this case.
The UDF works on Bags and in this case, we want to enumerate a relation.
Implementing RANK is a very tricky thing to do correctly. I'm not even sure
if it's doable just by using Pig operators, UDFs or macros. Best option is
p
Hello,
There is a similar UDF in DataFu named Enumerate.
http://datafu.incubator.apache.org/docs/datafu/1.2.0/datafu/pig/bags/Enumerate.html
I wish it may help.
James
In that situation you could write a script that tacks on the equivalent value
that rank does, and stream the ordered relations through it.
I'm assuming you have a sense of order on both these relations.
After that join like you would after rank.
I'm not at a computer so can't type up an example
I don't think my version of PIG supports the rank function, I keep getting
Internal Error. I would update it, but I am not in control of the cluster.
On Tue, Mar 25, 2014 at 4:16 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:
> John's answer about RANK sounds like it should solve your
CROSS is by definition a very very expensive operation. Regardless, CROSS
is the wrong operator for what you're trying to do.
As was suggested by others, you want to RANK the relations then do a JOIN
by the rank.
On Tue, Mar 25, 2014 at 1:27 PM, wrote:
> Here is how to use rank and join for th
Here is how to use rank and join for this problem:
sh cat xxx
1,2,3,4,5
1,2,4,5,7
1,5,7,8,9
sh cat yyy
10,11
10,12
10,13
a= load 'xxx' using PigStorage(',');
b= load 'yyy' using PigStorage(',');
a2 = rank a;
b2 = rank b;
c = join a1 by $0, b2 by $0;
c2 = order c by $6;
c3 = foreach c2 generat
John's answer about RANK sounds like it should solve your problem
> On Mar 25, 2014, at 1:13 PM, Christopher Surage wrote:
>
> @ pradeep, I know what the cross product will do, but I have many lines in
> many files. So the cross will take far too long to complete.
>
>
> On Tue, Mar 25, 2014 at
@ pradeep, I know what the cross product will do, but I have many lines in
many files. So the cross will take far too long to complete.
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota wrote:
> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the dat
yes
On Tue, Mar 25, 2014 at 4:07 PM, Shahab Yunus wrote:
> Oh, sorry. This new example is something different from what I understood
> before. I thought you were only trying to append one relation (with one
> tuple) to another (which has more than one tuple).
>
> So essentially you want to loop
Try this: http://pig.apache.org/docs/r0.11.0/basic.html#rank
Rank each data set then join on the rank.
On Tue, Mar 25, 2014 at 4:03 PM, Christopher Surage wrote:
> The output I would like to see is
>
> (1,2,3,4,5,10,11)
> (1,2,4,5,7,10,12)
> (1,5,7,8,9,10,13)
>
>
> On Tue, Mar 25, 2014 at 3:58 P
Oh, sorry. This new example is something different from what I understood
before. I thought you were only trying to append one relation (with one
tuple) to another (which has more than one tuple).
So essentially you want to loop over 2 collection and combine their tuples.
Are they always going to
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota wrote:
> I don't understand what you're trying to do from your example.
>
> If you perform a cross on the data you have, the output will be the
> following:
I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will be the
following:
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,
Have you tried iterating over the first relation and in the nested
*generate* clause, always appending the second relation? Your top level
looping is on first relation but in the nested block you are sort of
hardcoding appending of second relation.
I am referring to the examples like in "Example:
I am trying to perform the following action, but the only solution I have
been able to come up with is using a CROSS, but I don't want to use that
statement as it is a very expensive process.
(1,2,3,4,5) (10,11)
(1,2,4,5,7) (10,11)
(1,5,7,8,9) (10,11)
I want to make it
Hi:
I have a record of union type of
union {TypeA, TypeB, TypeC, TypeD, TypeE} mydata;
I have the serialized data in avro format, however when I am trying to use
piggybank.jar's AvroStorage function to load the avro data, it gives me the
following error:
Caused by: java.io.IOException: We don't
Sadly I was not able to attend the last bay area user meetup at Linkedin that
was held on March 14. I'm very interested to see some of the presentations, so
I'm wondering if there are plans to publish the recordings?
Jarcec
signature.asc
Description: Digital signature
I hithttps://issues.apache.org/jira/browse/PIG-3512
Le 24/03/2014 14:40, Vincent Barat a écrit :
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data:
gru
Hi All,I am reading hbase table as following: A = LOAD 'APE1_RATED_EVENT' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('', '-loadKey true') AS
(id:bytearray);
B = GROUP A BY id;
X = FOREACH B GENERATE COUNT_STAR(A);
DUMP X
The job failed, and I found following error in hadoop task l
19 matches
Mail list logo