Any clues on what configuration this could be?

On 9/21/10 11:27 AM, Ted Dunning wrote:
This very much sounds like a hadoop config problem.  Other users have used
Mahout to compute frequent item sets over billions of items.

On Tue, Sep 21, 2010 at 11:09 AM, Mark<[email protected]>  wrote:

  Smaller samples work. It seems like anytime more than 1 reduce tasks is
launched then it will hang and never finish. Is this a possible hadoop
configuration bug?

On 9/18/10 12:08 PM, Ted Dunning wrote:

Good advice relative to Mahout as well.  Trying it on a smaller sample
will
tell you if it is due to bad scaling or really a hangup.

On Sat, Sep 18, 2010 at 12:03 PM, Mark<[email protected]>   wrote:

   Thanks. Ill give this a try and see how it performs

On 9/18/10 12:01 PM, Neal Richter wrote:

  I suggest you take a sample of your data and run it on these
non-hadoop implementations of itemset miners, FPGrowth is one of the
available algorithms.

http://www.borgelt.net/fpm.html

If you have success on a small sample then start upscaling the sample
as well as investigate the distributions of your data.

- Neal

On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]>
  wrote:

  In order to encourage your excellent practice of reposting, I will
repeat
my
(non)-answer here.

-------------------------------------------
I don't know the answer to this, but previously this kind of problem
was
caused by highly skewed statistics in the input data.

If there are things that cooccur with everything, then you will have
problems with the speed of the algorithm.

Can you say something about the distribution of your data?  Can you
post
a
frequency by rank table?

On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]>
  wrote:

   I am trying to run FPGrowth:

/hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5
-g
17500 -k 50/

However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth>
  reduce"/
will not finish. It's basically stuck at 85% and hasn't budged in over
an
hour. The output of the first task outputted there were about 37K
features
so I set -g to 17500. Does anyone know whats going on and how I can
speed
this up?

Thanks



Reply via email to