join result dataset bigger than before

2012-06-26 Thread Marco Cadetg
Hi there, I'm doing a join like this: A = LOAD '/data/sessions' USING PigStorage(',') AS (userid:chararray, client_type:chararray, flag:long); A1 = GROUP bettyy_sessions ALL; A1 = FOREACH A1 GENERATE COUNT(A); DUMP A1 (543872) B = LOAD '/data/userdb' USING PigStorage(',') AS (uid:chararray,

Passing a BAG to Pig UDF constructor?

2012-06-26 Thread Dexin Wang
Is it possible to pass a bag to a Pig UDF constructor? Basically in the constructor I want to initialize some hash map so that on every exec operation, I can use the hashmap to do a lookup and find the value I need, and apply some algorithm to it. I realize I could just do a replicated join to

RE: Passing a BAG to Pig UDF constructor?

2012-06-26 Thread Mridul Muralidharan
You could dump the data in a dfs file and pass the location of the file as param to your udf in define - so that it initializes itself using that data ... - Mridul -Original Message- From: Dexin Wang [mailto:wangde...@gmail.com] Sent: Tuesday, June 26, 2012 10:58 PM To:

a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Danfeng Li
We found the following simple logic will cause very long compiling time for pig 0.10.0, while using pig 0.8.1, everything is fine. A = load 'A.txt' using PigStorage() AS (m: int); B = FOREACH A { days_str = (chararray) (m == 1 ? 31: (m == 2 ? 28: (m == 3 ? 31:

Re: a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Clay B.
It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the degradation with Pig 0.10.0. The degradation in performance seems to have a knee as 4 or 5 conditionals works as expected but as presented, the script takes about 6 minutes at the GRUNT prompt after hitting enter;

Re: a simple logic causes very long compiling time on pig 0.10.0

2012-06-26 Thread Jonathan Coveney
This is a great find. Please file a ticket. My guess is that there is some backtracking in the parser, which explodes for large values. 2012/6/26 Clay B. c...@clayb.net It's worth pointing out that Pig 0.9.2 also runs quickly; we only see the degradation with Pig 0.10.0. The degradation in

how can I distinct one field of a relation

2012-06-26 Thread Haitao Yao
hi, How can I distinct only one field of a relation? here's the demo: A = LOAD 'data' AS (a1:int,a2:int,a3:int); B = distinct A by a1; how can I do this? Haitao Yao yao.e...@gmail.com weibo: @haitao_yao Skype: haitao.yao.final

Re: how can I distinct one field of a relation

2012-06-26 Thread Jonathan Coveney
What is your desired output? Sounds like you want a group. 2012/6/26 Haitao Yao yao.e...@gmail.com hi, How can I distinct only one field of a relation? here's the demo: A = LOAD 'data' AS (a1:int,a2:int,a3:int); B = distinct A by a1; how can I do this?