Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-23 Thread sonia gehlot
Hi All, This script worked for me by setting following condition *set pig.exec.nocombiner true;* Thanks for your help. Sonia On Mon, May 23, 2011 at 3:03 PM, Dmitriy Ryaboy wrote: > you can group by your key + the thing you want distinct counts of, and > generate counts of those. > > On Mon,

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-23 Thread Dmitriy Ryaboy
you can group by your key + the thing you want distinct counts of, and generate counts of those. On Mon, May 23, 2011 at 2:17 PM, sonia gehlot wrote: > Hi Shawn, > > I tried using SUBSTRING in my script with different combinations but still > getting OOM errors. > > is there is any other alternat

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-23 Thread sonia gehlot
Hi Shawn, I tried using SUBSTRING in my script with different combinations but still getting OOM errors. is there is any other alternative to use distinct - count against very large set of data. Thanks, Sonia On Fri, May 20, 2011 at 1:54 PM, Xiaomeng Wan wrote: > It servers two purposes: > 1.

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-20 Thread Xiaomeng Wan
It servers two purposes: 1. divide the group into smaller subgroups 2. make sure distinct in subgroup => distinct in group Shawn On Fri, May 20, 2011 at 2:20 PM, sonia gehlot wrote: > Hey, I am sorry but I din't get how substring will help in this? > > On Fri, May 20, 2011 at 1:08 PM, Xiaomeng W

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-20 Thread sonia gehlot
Hey, I am sorry but I din't get how substring will help in this? On Fri, May 20, 2011 at 1:08 PM, Xiaomeng Wan wrote: > you can try using some divide and conquer, like this: > > a = group data by (key, SUBSTRING(the_field_to_be_distinct, 0, 2)); > b = foreach a { x = distinct a.he_field_to_be_di

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-20 Thread Xiaomeng Wan
you can try using some divide and conquer, like this: a = group data by (key, SUBSTRING(the_field_to_be_distinct, 0, 2)); b = foreach a { x = distinct a.he_field_to_be_distinct; generate group.key as key, COUNT(x) as cnt; } c = group b by key; d = foreach c generate group as key, SUM(b.cnt) as cnt

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-20 Thread sonia gehlot
Hey Thejas, I tried setting up property pig.cachedbag.memusage to 0.1 and also tried computing distinct count for each type separately but still I am getting errors like Error: java.lang.OutOfMemoryError: Java heap space Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded java.io.IO

Re: java.lang.OutOfMemoryError while running Pig Job

2011-05-13 Thread Thejas M Nair
The stack trace shows that the OOM error is happening when the distinct is being applied. It looks like in some record(s) of the relation group_it, one more of the following bags is very large - logic.c_users, logic.nc_users or logic.registered_users; Try setting the property pig.cachedbag.memusa

java.lang.OutOfMemoryError while running Pig Job

2011-05-12 Thread sonia gehlot
Hi Guys, I am running following Pig script in Pig 0.8 version page_events = LOAD '/user/sgehlot/day=2011-05-10' as (event_dt_ht:chararray,event_dt_ut:chararray,event_rec_num:int,event_type:int, client_ip_addr:long,hub_id:int,is_cookied_user:int,local_ontology_node_id:int, page_type_id:int,content