Re: Increase timeout for running PFPGrowth

2012-10-22 Thread
-g means the number of groups when executing the fp-growth. it equals with the number of the reduce tasks, so I suggest you using the same number of your reducer in your cluster. -k means the cache that will be kept, so it could be larger if you have a big memory on single node. 在 2012年10月23日星期二,

Re: Help on understanding clusterdump output

2012-09-11 Thread
tor. > > Its explained at the bottom of this page > https://cwiki.apache.org/**MAHOUT/cluster-dumper.html<https://cwiki.apache.org/MAHOUT/cluster-dumper.html> > > > On 12-09-2012 11:24, 戴清灏 wrote: > >> Hi All, >> I was trying to run KMeans clustering algori

Help on understanding clusterdump output

2012-09-11 Thread
Hi All, I was trying to run KMeans clustering algorithm and its been executed successfully. After clusterdump, I want to get a better understanding on the output: *Key: CL-526: Value: CL-526{n=133 c=[44.729, 0.429, 0.977, 28487.102, 0.639, 0.97 0, 0.571, 0.647, 0.737, 0.383, 0.541] r=[9.890, 0.495

Re: Number of Reducers in PFP Growth is always 1 !!!

2012-08-30 Thread
> and conf.set("mapred.reduce.tasks","100"). > However, it does not seem to take the number of reducers at all, even for > the job that does parallel counting. Any advice would be appreciated. > Regards, > Krishnakumar. > On Aug 29, 2012, at 11:28 PM, 戴清灏 wrote:

Re: Number of Reducers in PFP Growth is always 1 !!!

2012-08-29 Thread
I doubt that you specify the config in hadoop config xml file. -- Regards, Q 2012/8/30 C.V. Krishnakumar Iyer > Hi, > > Quick question regarding PFPGrowth in Mahout 0.6: > > I see that there are no options to set the number of reducers in the > parallel counting phase of PFP Growth. It is jus

Re: Number of reduce tasks of PFP

2012-04-30 Thread
> How does it work with multiple reduce tasks? > > > > > 2012/4/30 戴清灏 > > > Then how big your input data size is? > > For a rather small dataset, one reduce task is enough to process. > > > > Regards, > > Q > > > > > > > > 201

Re: Number of reduce tasks of PFP

2012-04-30 Thread
ch node or only master node? > > Thanks for your help > > > 2012/4/30 戴清灏 > > > Sorry for having made you confused. > > I mean, if you have explicitly specify the reduce task number in your > > hadoop conf/mapred-site.xml or some where else, > > PFP would o

Re: Number of reduce tasks of PFP

2012-04-30 Thread
Sorry for having made you confused. I mean, if you have explicitly specify the reduce task number in your hadoop conf/mapred-site.xml or some where else, PFP would only execute one reduce task. Your parameter groups 10 would only make PFP call reduce method 10 times. Actually reduce method had been

Re: Number of reduce tasks of PFP

2012-04-29 Thread
reducer or reduce tasks? In PFP, there is a parameter called groups. For each group, PFP will generate a reduce task, not a reducer. You can specify how many reducer you want via modifying your xml configuration file in hadoop. Regards, Q 2012/4/29 Wenhao Xu > Edit the source code as u like.

Re: Mahout Clustering and HBase

2012-03-23 Thread
Hi, I think they are for different use cases. HBase is for high concurrency, while mahout is for data mining & machine learning. There may not be so many people running mahout at the same time. Regards, Q 在 2012年3月23日 下午10:11,戴睿 写道: > Hello, > I'm new for Mahout,and I've read Support of > HBas

Re: Frequent itemset mining

2011-12-01 Thread
u! > > In the meantime, can you direct me to where in the source I should start > looking? (ie, which class would be the entry point I'm looking for?) > > 2011/12/1 戴清灏 > > > There is actually a lack of the doc for the frequent pattern mining > usage. > > Actuall

Re: Frequent itemset mining

2011-12-01 Thread
There is actually a lack of the doc for the frequent pattern mining usage. Actually, you are not the first one who claims the need of it. I will be pleased to write one for that usage since I've read almost the source code of it. 在 2011年12月2日星期五,Dave Fry 写道: > Hi! I apologize for the newbie ques

Re: Trouble understanding how to use the FP_Growth algorithm

2011-11-21 Thread
Now it's morning in China. Morning! I have waken up. You may try this way: There is another sequential implementation of fp-growth by Borgelt. Link is here: http://www.borgelt.net/fpgrowth.html You may down load it. After compiling, you can try to run this on the same dataset with the same argument

Re: Trouble understanding how to use the FP_Growth algorithm

2011-11-21 Thread
only single item itemset? Maybe your dataset is too sparse or you may lower your support value. (default is 50) Regards, Q 2011/11/21 Sébastien Noir > Hi! > > I'm currently trying to understand how to use the implementation of the > FPGrowth algoritm (see : > https://cwiki.apache.org/MAHOUT/pa

Re: Can Mahout work with cores instead of cpus?

2011-10-16 Thread
Yes. I do agree with you. Sent from my mobile phone 在 2011-10-16 下午6:20,"Sean Owen" 写道: > I think what you're really asking is, can I run a Mahout job on Hadoop > using 1 worker, 2 workers, 4 workers? Yes. But this is a Hadoop > question, not Mahout. > > I'd also say that you are probably going t

About frequent pattern mining

2011-10-14 Thread
Hi, all, I am reading mahout's fp-growth code now, and I am a little bit confusing about its implementation. The part of grouping. Why grouping? And during the parallel fp-growth, why it ignore some items? Example: a transaction: A,B,C,D If A and B belongs to the same group, the

Re: LDA with Mahout

2011-10-08 Thread
If you can understand Chinese, you may read one post in my blog. It may help you build mahout. http://rogerdai16.wordpress.com/2011/08/20/%E6%89%93%E5%8C%85mahout/ 2011/10/8 Guang Xiang > Hi, all, > > >I want to run LDA using Mahout on some large data set, but have no > idea how to proceed.

Re: PfgGrowth job got stuck when run into fpGrowth.generateTopKFrequentPatterns

2011-09-22 Thread
Hi, The Pfp-growth on my cluster is fine when I run it onto *webdocs.dat*dataset.( http://fimi.ua.ac.be/data/) My cluster has about 15 nodes and this will only take less than 10 mins. I looked into you log and found that cache misses. Which version of mahout are you using? For mahout

Re: Mahout project running in eclipse

2011-08-19 Thread
any IM? I may help you step by step. My QQ: 175162478 :-) 在 2011年8月19日 下午4:33,张玉东 写道: > You means that I can create a project in eclipse, then build it by maven? > Do you have any guidelines or websites upon this issue I can refer to. > > -邮件原件- > 发件人: 戴清灏 [mailto:rogerda

Re: Mahout project running in eclipse

2011-08-19 Thread
yes, of course. It's a package tool. 在 2011年8月19日 下午4:26,张玉东 写道: > Can maven build my own projects developed based on mahout? > > -邮件原件----- > 发件人: 戴清灏 [mailto:rogerda...@gmail.com] > 发送时间: 2011年8月19日 16:16 > 收件人: user@mahout.apache.org > 主题: Re: Mahout project r

Re: Mahout project running in eclipse

2011-08-19 Thread
:03,张玉东 写道: > It is ok to run the mahout in the command line. I do not know whether > mahout supports the manner of "run on hadoop" in eclipse. Apparently, some > basic classes are not transported to the datanodes. > > -邮件原件- > 发件人: 戴清灏 [mailto:rogerda...@gmail.com

Re: Mahout project running in eclipse

2011-08-19 Thread
Try to run mahout-*-job.jar, not any other jar. Is your mahout version 0.5? 在 2011年8月19日 下午3:44,张玉东 写道: > Dear Mahouters, > I am a newer in Mahout. I try to setup Mahout in Eclipse running on Windows > and execute it on the remote Linux Based Hadoop cluster. However, when I > test the KMeans exam

Re: On building decision tree in mahout

2011-08-11 Thread
Thanks, I will try. 在 2011年8月11日 下午11:00,Xiaobo Gu 写道: > You can try Decision Tree and Random Forest in Weka, it's more convenient. > > 2011/8/11 戴清灏 : > > I can only see building random forest instead of decision tree. can > anyone > > provide ideas? > > >

Re: On building decision tree in mahout

2011-08-10 Thread
I can only see building random forest instead of decision tree. can anyone provide ideas? from my mobile phone 在 2011-8-11 上午1:06,"Ted Dunning" 写道: > Mahout does not have a lot of support for decision trees. > > On Wed, Aug 10, 2011 at 12:25 AM, 戴清灏 wrote: > >> Hi l

On building decision tree in mahout

2011-08-10 Thread
Hi list, I am trying to build decision tree in mahout recently. I searched wiki and list, only to find decision forest.(partial implementation) Is there any way in which I can build a decision tree? Regards Roger Dai

Re: this is a bug of mahout 0.5 ?

2011-08-05 Thread
i have the same problem. my solution is try to run mahout-*-job.jar. specify the class. from my mobile phone 在 2011-8-5 下午7:41,"air" 写道: > when I use mahout 0.4 to execute on a hadoop cluster: > * > ./mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job --input > /mahout/input --output

Re: Understanding Mahout Algos and Applications

2011-08-03 Thread
Hi, I think the core issue is not how this engine work, but whether mahout fits your data size. Mahout is built on hadoop, which digest big data. If your data size is not that huge or incompatible with mapreduce model, it may not be a good idea. Regards. Roger 2011/8/4 Josh Dul

Re: about the implementation of FP-Growth

2011-08-03 Thread
Hi, I am reading source code of FPGrowth in Mahout too. I guess this implementation is base on this paper: paper I still have another question that how does this algorithm keep consistency in MapReduce chain? Hopefully I can hav