The reason is that some groups(having medium frequency features) have too many branches, when data is very sparse. I have to think if there is a smart pruning mechanism for them. Maybe even compute single conditional tree for one feature for medium sized features.
Robin On Tue, Nov 23, 2010 at 9:05 PM, <[email protected]> wrote: > Yes I tried with 3000 and still the same issue.. > > -----Original Message----- > From: ext Robin Anil [mailto:[email protected]] > Sent: Tuesday, November 23, 2010 10:32 AM > To: [email protected] > Subject: Re: OutOfMemory with PFPGrowth > > Have you tried increasing g to 1000 and above ? > > On Tue, Nov 23, 2010 at 8:50 PM, <[email protected]> wrote: > > > Hello all, > > I was able to successfully test PFPGrowth with 50M transactions. Now I > > am testing with 150M transactions and no matter what group size I use > > I am getting out of memory when running the FPGrowth job. It finishes > > parallel counting and transaction sorting job fine but when its > > running FPGrowth job, I always get outofmemory. > > > > On Hadoop side:map/reduce process heap size is 2G. No. of reduce jobs > > is 24 on total of 4 hadoop cluster. > > On Mahout side: I specified minSupport as 250 and tried with group > > size from 500 to 3000. > > Out of 150M transactions, Its generating about 6500 features so I > > thought group size of 500 should be good enough to avoid out of memory. > > > > What params can I change to fix the outofmemory issue? > > Can someone throw some light on how to come up with optimal parameter > > values to avoid such issues on production system? > > > > Any help is appreciated. > > > > Praveen > > > > 10/11/23 10:16:52 INFO mapred.JobClient: map 100% reduce 20% > > 10/11/23 10:17:01 INFO mapred.JobClient: map 100% reduce 17% > > 10/11/23 10:17:03 INFO mapred.JobClient: Task Id : > > attempt_201011221932_0009_r_000013_2, Status : FAILED > > Error: Java heap space > > 10/11/23 10:17:10 INFO mapred.JobClient: map 100% reduce 14% > > 10/11/23 10:17:12 INFO mapred.JobClient: Task Id : > > attempt_201011221932_0009_r_000018_0, Status : FAILED > > Error: Java heap space > > 10/11/23 10:17:14 INFO mapred.JobClient: map 100% reduce 11% > > 10/11/23 10:17:16 INFO mapred.JobClient: map 100% reduce 12% > > 10/11/23 10:17:16 INFO mapred.JobClient: Task Id : > > attempt_201011221932_0009_r_000016_1, Status : FAILED > > Error: Java heap space > > 10/11/23 10:17:19 INFO mapred.JobClient: map 100% reduce 8% > > 10/11/23 10:17:22 INFO mapred.JobClient: map 100% reduce 9% > > 10/11/23 10:17:25 INFO mapred.JobClient: Task Id : > > attempt_201011221932_0009_r_000019_0, Status : FAILED > > Error: Java heap space > > > > >
