If you are looking for a good source of data try

http://fimi.cs.helsinki.fi/data/

Thanks,

Neal.

On Sat, Sep 18, 2010 at 12:38 PM, Neal Richter <[email protected]> wrote:

> +1
>
> Try data other than your own as well.
>
>
>
>
> On 9/18/10, Ted Dunning <[email protected]> wrote:
> > Good advice relative to Mahout as well.  Trying it on a smaller sample
> will
> > tell you if it is due to bad scaling or really a hangup.
> >
> > On Sat, Sep 18, 2010 at 12:03 PM, Mark <[email protected]>
> wrote:
> >
> >>  Thanks. Ill give this a try and see how it performs
> >>
> >>
> >> On 9/18/10 12:01 PM, Neal Richter wrote:
> >>
> >>> I suggest you take a sample of your data and run it on these
> >>> non-hadoop implementations of itemset miners, FPGrowth is one of the
> >>> available algorithms.
> >>>
> >>> http://www.borgelt.net/fpm.html
> >>>
> >>> If you have success on a small sample then start upscaling the sample
> >>> as well as investigate the distributions of your data.
> >>>
> >>> - Neal
> >>>
> >>> On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]>
> >>>  wrote:
> >>>
> >>>> In order to encourage your excellent practice of reposting, I will
> >>>> repeat
> >>>> my
> >>>> (non)-answer here.
> >>>>
> >>>> -------------------------------------------
> >>>> I don't know the answer to this, but previously this kind of problem
> was
> >>>> caused by highly skewed statistics in the input data.
> >>>>
> >>>> If there are things that cooccur with everything, then you will have
> >>>> problems with the speed of the algorithm.
> >>>>
> >>>> Can you say something about the distribution of your data?  Can you
> post
> >>>> a
> >>>> frequency by rank table?
> >>>>
> >>>> On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]>
> >>>>  wrote:
> >>>>
> >>>>   I am trying to run FPGrowth:
> >>>>>
> >>>>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
> >>>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
> >>>>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s
> 5
> >>>>> -g
> >>>>> 17500 -k 50/
> >>>>>
> >>>>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth>
> >>>>>  reduce"/
> >>>>> will not finish. It's basically stuck at 85% and hasn't budged in
> over
> >>>>> an
> >>>>> hour. The output of the first task outputted there were about 37K
> >>>>> features
> >>>>> so I set -g to 17500. Does anyone know whats going on and how I can
> >>>>> speed
> >>>>> this up?
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>>
> >
>

Reply via email to