Hi there,
I'm doing a join like this:
A = LOAD '/data/sessions' USING PigStorage(',') AS
(userid:chararray, client_type:chararray, flag:long);
A1 = GROUP bettyy_sessions ALL;
A1 = FOREACH A1 GENERATE COUNT(A);
DUMP A1
(543872)
B = LOAD '/data/userdb' USING PigStorage(',') AS (uid:chararray,
Is it possible to pass a bag to a Pig UDF constructor?
Basically in the constructor I want to initialize some hash map so that on
every exec operation, I can use the hashmap to do a lookup and find the
value I need, and apply some algorithm to it.
I realize I could just do a replicated join to
You could dump the data in a dfs file and pass the location of the file as
param to your udf in define - so that it initializes itself using that data ...
- Mridul
-Original Message-
From: Dexin Wang [mailto:wangde...@gmail.com]
Sent: Tuesday, June 26, 2012 10:58 PM
To:
We found the following simple logic will cause very long compiling time for pig
0.10.0, while using pig 0.8.1, everything is fine.
A = load 'A.txt' using PigStorage() AS (m: int);
B = FOREACH A {
days_str = (chararray)
(m == 1 ? 31:
(m == 2 ? 28:
(m == 3 ? 31:
It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the
degradation with Pig 0.10.0.
The degradation in performance seems to have a knee as 4 or 5 conditionals
works as expected but as presented, the script takes about 6 minutes at
the GRUNT prompt after hitting enter;
This is a great find. Please file a ticket.
My guess is that there is some backtracking in the parser, which explodes
for large values.
2012/6/26 Clay B. c...@clayb.net
It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the
degradation with Pig 0.10.0.
The degradation in
hi,
How can I distinct only one field of a relation?
here's the demo:
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
B = distinct A by a1;
how can I do this?
Haitao Yao
yao.e...@gmail.com
weibo: @haitao_yao
Skype: haitao.yao.final
What is your desired output? Sounds like you want a group.
2012/6/26 Haitao Yao yao.e...@gmail.com
hi,
How can I distinct only one field of a relation?
here's the demo:
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
B = distinct A by a1;
how can I do this?