Classification Algorithms in Mahout

2013-03-23 Thread Chidananda Sridhar
Hi, I am doing a class project on classification and want to use Mahout. I was searching for the classification algorithms already implemented in Mahout and came to this page: https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms The webpage says that Online Passive Aggressive

Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
I have gone through http://mahout.apache.org for some data mining algorithms already implemented on the Hadoop plattform. >From that i understood that 1. Kmeans 2. Decision Tree 3. Navie Bayes Have implementation in hadoop platform And for 4. DBscan 5. k-mearesr neighbr 6. svm 7. Logistic Regr

Re: Algorithms in Mahout

2013-11-25 Thread Manuel Blechschmidt
Hi Unmesha, please also consult JIRA as a source for algorithm, there you find implementations or discussions: e.g. for neural networks a.k.a multilayer perceptrons: https://issues.apache.org/jira/browse/MAHOUT-1265 https://issues.apache.org/jira/browse/MAHOUT-976 SVM: https://issues.apache.org/

Re: Algorithms in Mahout

2013-11-25 Thread Ted Dunning
On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt < manuel.blechschm...@gmx.de> wrote: > There are/were multiple kNN implementation in Mahout: > Recommender knn > http://grepcode.com/file/repo1.maven.org/maven2/org.apache.mahout/mahout-core/0.6/org/apache/mahout/cf/taste/impl/recommender/knn/Op

Re: Algorithms in Mahout

2013-11-25 Thread Dhruv
Distributed Hidden Markov Model trainer using Baum Welch Algorithm is also available as a patch. Please see the JIRA issue MAHOUT-627. On Mon, Nov 25, 2013 at 8:07 AM, Ted Dunning wrote: > On Mon, Nov 25, 2013 at 3:14 AM, Manuel Blechschmidt < > manuel.blechschm...@gmx.de> wrote: > > > There ar

Re: Algorithms in Mahout

2013-11-25 Thread Suneel Marthi
Dhruv, Could u update the patch to present trunk codebase and also create a Wiki page for this? On Monday, November 25, 2013 1:04 PM, Dhruv wrote: Distributed Hidden Markov Model trainer using Baum Welch Algorithm is also available as a patch. Please see the JIRA issue MAHOUT-627. On M

Re: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
Thxs for the replies. I will go through those links.Thanks for spending time for me :) On Mon, Nov 25, 2013 at 11:59 PM, Suneel Marthi wrote: > Dhruv, > > Could u update the patch to present trunk codebase and also create a Wiki > page for this? > > > > > > On Monday, November 25, 2013 1:04 PM,

Re: Classification Algorithms in Mahout

2013-03-24 Thread Ted Dunning
You are correct to suspect that this page is substantially out of date. Currently, Mahout has the following classifiers: - stochastic gradient descent for logistic regression (SGD) with L_1 or L_2 regularization, sequential version only. These classifiers can be easily extended with other grad

Re: Classification Algorithms in Mahout

2013-03-24 Thread Ey-Chih chow
On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote: > - random forest, sequential and parallel implementations, new versions are > being developed, the current version may or may not be useful to you. > Can you elaborate the usefulness of the current version and features of the new versions? Thank

Re: Classification Algorithms in Mahout

2013-03-24 Thread Ted Dunning
I think that there are some others who could say more. On Mon, Mar 25, 2013 at 6:01 AM, Ey-Chih chow wrote: > On Mar 24, 2013, at 1:00 AM, Ted Dunning wrote: > > > - random forest, sequential and parallel implementations, new versions > are being developed, the current version may or may not be

Re: Classification Algorithms in Mahout

2013-03-27 Thread Yutaka Mandai
My understanding of current Random Forrest has a certain level of improvement for running on Hadoop cluster from data splitting alignment perspective for better balanced CPU utilization. Regards,,, Y.Mandai iPhoneから送信 On 2013/03/25, at 14:48, Ted Dunning wrote: > I think that there are some

Re: Classification Algorithms in Mahout

2013-03-27 Thread Andy Twigg
Dear Ey-Chih, What are your use cases for a better random forest? On 27 March 2013 11:59, Yutaka Mandai <20525entrad...@gmail.com> wrote: > My understanding of current Random Forrest has a certain level of improvement > for running on Hadoop cluster from data splitting alignment perspective for

Re: Classification Algorithms in Mahout

2013-04-06 Thread ey-chih chow
I actually got a lot of over fitting. The parameter that I can adjust is minSplitNum. Is there any other parameters that I can adjust to avoid over fitting. Thanks. Ey-Chih On Wed, Mar 27, 2013 at 3:12 PM, Andy Twigg wrote: > Dear Ey-Chih, > > What are your use cases for a better random for

RE: Classification Algorithms in Mahout

2013-04-10 Thread Bhattacharjee, Rohan
Doesn't the "random" part of random forest defend against overfitting ? -Original Message- From: ey-chih chow [mailto:eyc...@gmail.com] Sent: Saturday, April 06, 2013 5:45 PM To: user@mahout.apache.org Subject: Re: Classification Algorithms in Mahout I actually go

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Pavan K Narayanan
k nearest neibhor, svm, logistic regression, neural nets exist in mahout . just type mahout and press enter you ll see list of algorithms available and type mahout algo-name -h to get detailed information about how to use /configure them Pavan On Nov 25, 2013 2:44 PM, "unmesha sreeveni" wrote: >

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread Sebastian Schelter
>From the algorithms listed, only logistic regression (non-distributed) is implemented. Sorry, for the confusion, we are currently reworking the wiki. On 25.11.2013 10:24, Pavan K Narayanan wrote: > k nearest neibhor, svm, logistic regression, neural nets exist in mahout . > just type mahout and

Re: Fwd: Algorithms in Mahout

2013-11-25 Thread unmesha sreeveni
So currently we dnt have Decision Tree in mahout 0.6 release. On Mon, Nov 25, 2013 at 2:59 PM, Sebastian Schelter wrote: > From the algorithms listed, only logistic regression (non-distributed) > is implemented. > > Sorry, for the confusion, we are currently reworking the wiki. > > On 25.11.201

Reproducibility, and Recommender Algorithms in Mahout

2013-03-30 Thread Reinhard Denis Najogie
Dear all, I am doing experiments as a part of my final project. I'm comparing the performance of Mahout's implementations of recommender algorithms on some public dataset (so far bookcross and grouplens). I want to ask 2 questions: 1. The score (RMSE) results quite vary each time I run an algorit

Problems with genetic algorithms in Mahout

2011-06-06 Thread Jose Fuentes De Frutos
Hello!, I am working with genetics algorithms in Mahout. Specifically I am trying to use the "MahoutEvaluator" to evaluate my own FitnessEvaluator based on the DummyEvaluator. The problem is that if I execute the program sequentially without hadoop ($ java -jar geneticoJose2.jar) t

Re: Reproducibility, and Recommender Algorithms in Mahout

2013-03-30 Thread Sebastian Schelter
> 2. Where can I see the list of all recommender algorithms already > implemented by Mahout? From what I read on Mahout in Action book, there are > 6 algorithms: UserBased, ItemBased, Slope One, SVD, KnnItemBased, and > TreeClustering. Are there new algorithms since then? Oh, and We have a class

Re: Reproducibility, and Recommender Algorithms in Mahout

2013-03-30 Thread Sean Owen
You should be able to get reproducible random seed values by calling RandomUtils.useTestSeed() at the very start of your program. But if your goal is to get an unbiased view of the quality of results, you want to run several times and take the average yes. On Sat, Mar 30, 2013 at 3:57 PM, Reinhard

Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread XiaoboGu
We will put a big SMP server to deploy Mahout. Regards, Xiaobo Gu

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Stefan Wienert
Typically, they are all standalone... 2011/6/24 XiaoboGu : > We will put a big SMP server to deploy Mahout. > > Regards, > > Xiaobo Gu > > -- Stefan Wienert http://www.wienert.cc ste...@wienert.cc Telefon: +495251-2026838 Mobil: +49176-40170270

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Sean Owen
No, many are implemented only on Hadoop. But you can run a one-machine Hadoop cluster if you like. So yes in that sense. On Fri, Jun 24, 2011 at 9:47 AM, XiaoboGu wrote: > We will put a big SMP server to deploy Mahout. > > Regards, > > Xiaobo Gu > >

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Ted Dunning
Big iron is fine for some of the classifier stuff, but throughput per $ can be higher for other algorithms with a cluster of smaller machines. How big a machine are you talking about? Even relatively small machines are pretty massive any more. 8 core = 16 hyper-thread machines with 48GB seem to

RE: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread XiaoboGu
32Core, 256G RAM > -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Saturday, June 25, 2011 1:37 AM > To: user@mahout.apache.org > Cc: d...@mahout.apache.org > Subject: Re: Can all the algorithms in Mahout be run locally without a Hadoop &

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Ted Dunning
aturday, June 25, 2011 1:37 AM > > To: user@mahout.apache.org > > Cc: d...@mahout.apache.org > > Subject: Re: Can all the algorithms in Mahout be run locally without a > Hadoop cluster. > > > > Big iron is fine for some of the classifier stuff, but throughput per $ &

RE: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread XiaoboGu
g > Subject: Re: Can all the algorithms in Mahout be run locally without a Hadoop > cluster. > > Pretty big. SHould scream for local classifier learning. > > Local Hadoop should run pretty fast as well. > > On Fri, Jun 24, 2011 at 5:54 PM, XiaoboGu wrote: > > >

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Ted Dunning
. > > > -Original Message- > > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > > Sent: Saturday, June 25, 2011 9:26 AM > > To: user@mahout.apache.org > > Cc: d...@mahout.apache.org > > Subject: Re: Can all the algorithms in Mahout be run locally wi

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Xiaobo Gu
t;> > From: Ted Dunning [mailto:ted.dunn...@gmail.com] >> > Sent: Saturday, June 25, 2011 9:26 AM >> > To: user@mahout.apache.org >> > Cc: d...@mahout.apache.org >> > Subject: Re: Can all the algorithms in Mahout be run locally without a >> Hadoop

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread edwin
and task >> trackers on a single SMP server. >> >>> -Original Message- >>> From: Ted Dunning [mailto:ted.dunn...@gmail.com] >>> Sent: Saturday, June 25, 2011 9:26 AM >>> To: user@mahout.apache.org >>> Cc: d...@mahout.apache.org >>

RE: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread XiaoboGu
10:12 AM > To: d...@mahout.apache.org > Cc: user@mahout.apache.org > Subject: Re: Can all the algorithms in Mahout be run locally without a Hadoop > cluster. > > I have done this with VM's but I would not generally recommend it. Without > VM's you will have a pretty

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Chris Schilling
al Message----- >>>> From: Ted Dunning [mailto:ted.dunn...@gmail.com] >>>> Sent: Saturday, June 25, 2011 9:26 AM >>>> To: user@mahout.apache.org >>>> Cc: d...@mahout.apache.org >>>> Subject: Re: Can all the algorithms in Maho

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Sean Owen
t;> Do you have any experience in running multiple data nodes and task > >> trackers on a single SMP server. > >> > >>> -Original Message- > >>> From: Ted Dunning [mailto:ted.dunn...@gmail.com] > >>> Sent: Saturday, June 25, 2011 9:26 AM >

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Xiaobo Gu
> wrote: >> > >> >> Do you have any experience in running multiple data nodes and task >> >> trackers on a single SMP server. >> >> >> >>> -Original Message- >> >>> From: Ted Dunning [mailto:ted.dunn...@gmail.co

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Xiaobo Gu
.dunn...@gmail.com] >> > Sent: Saturday, June 25, 2011 9:26 AM >> > To: user@mahout.apache.org >> > Cc: d...@mahout.apache.org >> > Subject: Re: Can all the algorithms in Mahout be run locally without a >> Hadoop cluster. >> > >> >

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Chris Schilling
le data nodes and task >>>> trackers on a single SMP server. >>>> >>>>> -Original Message- >>>>> From: Ted Dunning [mailto:ted.dunn...@gmail.com] >>>>> Sent: Saturday, June 25, 2011 9:26 AM >>>>> To: user@

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Sean Owen
I think EMR is well worth using. I just think you do want to throw more, and smaller, machines at the task than you imagine. I used the 'small' instance but you might get away with a fleet of micro instances even. And do most certainly request spot instances for your workers (but pay full rate for

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Chris Schilling
Okay, great. Thanks for the tips! On Jun 25, 2011, at 2:21 AM, Sean Owen wrote: > I think EMR is well worth using. I just think you do want to throw more, and > smaller, machines at the task than you imagine. I used the 'small' instance > but you might get away with a fleet of micro instances ev

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Ted Dunning
nd task > >> trackers on a single SMP server. > >> > >>> -Original Message- > >>> From: Ted Dunning [mailto:ted.dunn...@gmail.com] > >>> Sent: Saturday, June 25, 2011 9:26 AM > >>> To: user@mahout.apache.org > >>> Cc:

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Ted Dunning
I have had best results with somewhat beefier machines because you pay less VM overhead. Typical Hadoop configuration advice lately is 4GB per core and 1 disk spindle per two cores. For higher performance systems like MapR, the number of spindles can go up. On Sat, Jun 25, 2011 at 2:21 AM, Sean

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Ken Krugler
On Jun 25, 2011, at 9:15am, Ted Dunning wrote: > I have had best results with somewhat beefier machines because you pay less > VM overhead. Definitely matches my experience. For example, with m1.small instances the I/O performance is dreadful. So we typically run with m1.large, and spot pricin

Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-25 Thread Ken Krugler
at 6:49 PM, XiaoboGu >>> wrote: >>>> >>>>> Do you have any experience in running multiple data nodes and task >>>>> trackers on a single SMP server. >>>>> >>>>>> -Original Message- >>>>>>

What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Xiaobo Gu
Hi Because I am a new user, so I will appreciate for a table like this: Algorithm Name Current Status Local-Run Commands Map-Reduce Run Commands Dataset file format LogisticRegression ProductiontrainAdaptiveLogistic... N/A CSV with head

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Ted Dunning
I think your table got kind of hashed up. Can you put your table on a Mahout wiki page? On Wed, Jul 13, 2011 at 7:11 AM, Xiaobo Gu wrote: > Hi > > Because I am a new user, so I will appreciate for a table like this: > > Algorithm Name Current Status Local-Run Commands > Map-Reduce Run

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Sean Owen
The only objection to this is that it is yet another piece of information to be maintained, and there is a strong chance it will not quite be kept up to date. We already have a bit of doc rot in the javadoc itself, and the wiki (which is just par for the course in a volunteer project). I would fin

Re: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-13 Thread Ted Dunning
Agreed. And I would encourage the original poster to put their table onto the wiki and post a JIRA pointing to the classes they would like better javadoc in. It is always easier to respond to specifics than make open-ended wide-spread improvements (Sean is excepted from this generalization). On

RE: What about mentaining a short descriptive tables about each algorithms in Mahout on Wiki for new users?

2011-07-14 Thread XiaoboGu
1 3:04 AM > To: user@mahout.apache.org > Cc: d...@mahout.apache.org > Subject: Re: What about mentaining a short descriptive tables about each > algorithms in > Mahout on Wiki for new users? > > Agreed. And I would encourage the original poster to put their table onto > th