Hi All,
This script worked for me by setting following condition
*set pig.exec.nocombiner true;*
Thanks for your help.
Sonia
On Mon, May 23, 2011 at 3:03 PM, Dmitriy Ryaboy wrote:
> you can group by your key + the thing you want distinct counts of, and
> generate counts of those.
>
> On Mon,
you can group by your key + the thing you want distinct counts of, and
generate counts of those.
On Mon, May 23, 2011 at 2:17 PM, sonia gehlot wrote:
> Hi Shawn,
>
> I tried using SUBSTRING in my script with different combinations but still
> getting OOM errors.
>
> is there is any other alternat
Hi Shawn,
I tried using SUBSTRING in my script with different combinations but still
getting OOM errors.
is there is any other alternative to use distinct - count against very large
set of data.
Thanks,
Sonia
On Fri, May 20, 2011 at 1:54 PM, Xiaomeng Wan wrote:
> It servers two purposes:
> 1.
It servers two purposes:
1. divide the group into smaller subgroups
2. make sure distinct in subgroup => distinct in group
Shawn
On Fri, May 20, 2011 at 2:20 PM, sonia gehlot wrote:
> Hey, I am sorry but I din't get how substring will help in this?
>
> On Fri, May 20, 2011 at 1:08 PM, Xiaomeng W
Hey, I am sorry but I din't get how substring will help in this?
On Fri, May 20, 2011 at 1:08 PM, Xiaomeng Wan wrote:
> you can try using some divide and conquer, like this:
>
> a = group data by (key, SUBSTRING(the_field_to_be_distinct, 0, 2));
> b = foreach a { x = distinct a.he_field_to_be_di
you can try using some divide and conquer, like this:
a = group data by (key, SUBSTRING(the_field_to_be_distinct, 0, 2));
b = foreach a { x = distinct a.he_field_to_be_distinct; generate
group.key as key, COUNT(x) as cnt; }
c = group b by key;
d = foreach c generate group as key, SUM(b.cnt) as cnt
Hey Thejas,
I tried setting up property pig.cachedbag.memusage to 0.1 and also tried
computing distinct count for each type separately but still I am getting
errors like
Error: java.lang.OutOfMemoryError: Java heap space
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.io.IO
The stack trace shows that the OOM error is happening when the distinct is
being applied. It looks like in some record(s) of the relation group_it, one
more of the following bags is very large - logic.c_users, logic.nc_users or
logic.registered_users;
Try setting the property pig.cachedbag.memusa
Hi Guys,
I am running following Pig script in Pig 0.8 version
page_events = LOAD '/user/sgehlot/day=2011-05-10' as
(event_dt_ht:chararray,event_dt_ut:chararray,event_rec_num:int,event_type:int,
client_ip_addr:long,hub_id:int,is_cookied_user:int,local_ontology_node_id:int,
page_type_id:int,content