This very much sounds like a hadoop config problem.  Other users have used
Mahout to compute frequent item sets over billions of items.

On Tue, Sep 21, 2010 at 11:09 AM, Mark <[email protected]> wrote:

>  Smaller samples work. It seems like anytime more than 1 reduce tasks is
> launched then it will hang and never finish. Is this a possible hadoop
> configuration bug?
>
> On 9/18/10 12:08 PM, Ted Dunning wrote:
>
>> Good advice relative to Mahout as well.  Trying it on a smaller sample
>> will
>> tell you if it is due to bad scaling or really a hangup.
>>
>> On Sat, Sep 18, 2010 at 12:03 PM, Mark<[email protected]>  wrote:
>>
>>   Thanks. Ill give this a try and see how it performs
>>>
>>>
>>> On 9/18/10 12:01 PM, Neal Richter wrote:
>>>
>>>  I suggest you take a sample of your data and run it on these
>>>> non-hadoop implementations of itemset miners, FPGrowth is one of the
>>>> available algorithms.
>>>>
>>>> http://www.borgelt.net/fpm.html
>>>>
>>>> If you have success on a small sample then start upscaling the sample
>>>> as well as investigate the distributions of your data.
>>>>
>>>> - Neal
>>>>
>>>> On Sat, Sep 18, 2010 at 12:30 PM, Ted Dunning<[email protected]>
>>>>  wrote:
>>>>
>>>>  In order to encourage your excellent practice of reposting, I will
>>>>> repeat
>>>>> my
>>>>> (non)-answer here.
>>>>>
>>>>> -------------------------------------------
>>>>> I don't know the answer to this, but previously this kind of problem
>>>>> was
>>>>> caused by highly skewed statistics in the input data.
>>>>>
>>>>> If there are things that cooccur with everything, then you will have
>>>>> problems with the speed of the algorithm.
>>>>>
>>>>> Can you say something about the distribution of your data?  Can you
>>>>> post
>>>>> a
>>>>> frequency by rank table?
>>>>>
>>>>> On Sat, Sep 18, 2010 at 10:37 AM, Mark<[email protected]>
>>>>>  wrote:
>>>>>
>>>>>   I am trying to run FPGrowth:
>>>>>
>>>>>> /hadoop jar /opt/mahout-0.3/mahout-examples-0.3.job
>>>>>> org.apache.mahout.fpm.pfpgrowth.FPGrowthDriver -i
>>>>>> output/product/part-r-00000 -o pfp -method mapreduce -regex [\\t] -s 5
>>>>>> -g
>>>>>> 17500 -k 50/
>>>>>>
>>>>>> However the 3rd task:/ "Processing FPTree: Bottom Up FP Growth>
>>>>>>  reduce"/
>>>>>> will not finish. It's basically stuck at 85% and hasn't budged in over
>>>>>> an
>>>>>> hour. The output of the first task outputted there were about 37K
>>>>>> features
>>>>>> so I set -g to 17500. Does anyone know whats going on and how I can
>>>>>> speed
>>>>>> this up?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>

Reply via email to