Good advice relative to Mahout as well. Trying it on a smaller sample will tell you if it is due to bad scaling or really a hangup.
On Sat, Sep 18, 2010 at 12:03 PM, Mark <[email protected]> wrote: > Thanks. Ill give this a try and see how it performs > > > On 9/18/10 12:01 PM, Neal Richter wrote: > >> I suggest you take a sample of your data and run it on these >> non-hadoop implementations of itemset miners, FPGrowth is one of the >> available algorithms. >> >> http://www.borgelt.net/fpm.html >> >> If you have success on a small sample then start upscaling the sample >> as well as investigate the distributions of your data. >> >> - Neal >> >> On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]> >> wrote: >> >>> In order to encourage your excellent practice of reposting, I will repeat >>> my >>> (non)-answer here. >>> >>> ------------------------------------------- >>> I don't know the answer to this, but previously this kind of problem was >>> caused by highly skewed statistics in the input data. >>> >>> If there are things that cooccur with everything, then you will have >>> problems with the speed of the algorithm. >>> >>> Can you say something about the distribution of your data? Can you post >>> a >>> frequency by rank table? >>> >>> On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]> >>> wrote: >>> >>> I am trying to run FPGrowth: >>>> >>>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job >>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i >>>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5 >>>> -g >>>> 17500 -k 50/ >>>> >>>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth> >>>> reduce"/ >>>> will not finish. It's basically stuck at 85% and hasn't budged in over >>>> an >>>> hour. The output of the first task outputted there were about 37K >>>> features >>>> so I set -g to 17500. Does anyone know whats going on and how I can >>>> speed >>>> this up? >>>> >>>> Thanks >>>> >>>>
