Hi Anil,
Is there a temporary workaround to overcome this issue for now? I am currently 
stuck with this issue and cannot proceed. In future we have a limit of 
supporting 0.5 Billion transactions, not just 150M so I really have to come up 
with a strategy to make it work.

Do you think using 5000 groups would help? I thought after a certain point, 
increasing the number of groups will not make a difference...

Praveen

-----Original Message-----
From: ext Robin Anil [mailto:[email protected]] 
Sent: Tuesday, November 23, 2010 10:48 AM
To: [email protected]
Subject: Re: OutOfMemory with PFPGrowth

The reason is that some groups(having  medium frequency features) have too many 
branches, when data is very sparse. I have to think if there is a smart pruning 
mechanism for them. Maybe even compute single conditional tree for one feature 
for medium sized features.

Robin


On Tue, Nov 23, 2010 at 9:05 PM, <[email protected]> wrote:

> Yes I tried with 3000 and still the same issue..
>
> -----Original Message-----
> From: ext Robin Anil [mailto:[email protected]]
> Sent: Tuesday, November 23, 2010 10:32 AM
> To: [email protected]
> Subject: Re: OutOfMemory with PFPGrowth
>
> Have you tried increasing g to 1000 and above ?
>
> On Tue, Nov 23, 2010 at 8:50 PM, <[email protected]> wrote:
>
> > Hello all,
> > I was able to successfully test PFPGrowth with 50M transactions. Now 
> > I am testing with 150M transactions and no matter what group size I 
> > use I am getting out of memory when running the FPGrowth job. It 
> > finishes parallel counting and transaction sorting job fine but when 
> > its running FPGrowth job, I always get outofmemory.
> >
> > On Hadoop side:map/reduce process heap size is 2G. No. of reduce 
> > jobs is 24 on total of 4 hadoop cluster.
> > On Mahout side: I specified minSupport as 250 and tried with group 
> > size from 500 to 3000.
> > Out of 150M transactions, Its generating about 6500 features so I 
> > thought group size of 500 should be good enough to avoid out of memory.
> >
> > What params can I change to fix the outofmemory issue?
> > Can someone throw some light on how to come up with optimal 
> > parameter values to avoid such issues on production system?
> >
> > Any help is appreciated.
> >
> > Praveen
> >
> > 10/11/23 10:16:52 INFO mapred.JobClient:  map 100% reduce 20%
> > 10/11/23 10:17:01 INFO mapred.JobClient:  map 100% reduce 17%
> > 10/11/23 10:17:03 INFO mapred.JobClient: Task Id :
> > attempt_201011221932_0009_r_000013_2, Status : FAILED
> > Error: Java heap space
> > 10/11/23 10:17:10 INFO mapred.JobClient:  map 100% reduce 14%
> > 10/11/23 10:17:12 INFO mapred.JobClient: Task Id :
> > attempt_201011221932_0009_r_000018_0, Status : FAILED
> > Error: Java heap space
> > 10/11/23 10:17:14 INFO mapred.JobClient:  map 100% reduce 11%
> > 10/11/23 10:17:16 INFO mapred.JobClient:  map 100% reduce 12%
> > 10/11/23 10:17:16 INFO mapred.JobClient: Task Id :
> > attempt_201011221932_0009_r_000016_1, Status : FAILED
> > Error: Java heap space
> > 10/11/23 10:17:19 INFO mapred.JobClient:  map 100% reduce 8%
> > 10/11/23 10:17:22 INFO mapred.JobClient:  map 100% reduce 9%
> > 10/11/23 10:17:25 INFO mapred.JobClient: Task Id :
> > attempt_201011221932_0009_r_000019_0, Status : FAILED
> > Error: Java heap space
> >
> >
>

Reply via email to