Re: SGD vs Naive Bayes for classification

2011-09-12 Thread Ted Dunning
They actually ported the liblinear algorithm so you should get comparable results unless there are bugs. Early tests looked good, but those are just that. On Mon, Sep 12, 2011 at 2:32 PM, Zach Richardson wrote: > I haven't played with the one in Mahout. From what I understand they > wrapped ei

Re: SGD vs Naive Bayes for classification

2011-09-12 Thread Zach Richardson
I haven't played with the one in Mahout. From what I understand they wrapped either Liblinear or Libsvm, so you should get comprobable results from that implementation as using Libsvm from the command line or embedded in Rapidminer or Weka. On Mon, Sep 12, 2011 at 9:17 AM, Ted Dunning wrote: >

Re: SGD vs Naive Bayes for classification

2011-09-12 Thread Ted Dunning
Hard to say and certainly not without substantial amounts of testing. The guy who did it seems pretty solid, but it never has been tested by anybody for production use. On Mon, Sep 12, 2011 at 12:54 AM, Loic Descotte wrote: > Mahout in Action is saying that SVM has been added to Mahout as "an >

Re: SGD vs Naive Bayes for classification

2011-09-12 Thread Loic Descotte
Hi Zach and Ted, Thanks a lot for your answers :) So I will try to focus on SVM instead of SGD/Naive Bayes. I'll also take a look to Rapid Miner and Luduan. Mahout in Action is saying that SVM has been added to Mahout as "an experimental implementation" Do you think it's usable for production

Re: SGD vs Naive Bayes for classification

2011-09-09 Thread Zach Richardson
Hi Loic, In my experience, when dealing with smaller datasets (i.e. the number of training examples you have is less than, let's say 1000, or even less than 100 per category). That a Linear SVM tends to perform better than Mahout's SGD. I would either recommend using Rapid Miner if you want a pr

Re: SGD vs Naive Bayes for classification

2011-09-09 Thread Ted Dunning
On Fri, Sep 9, 2011 at 8:41 AM, Loic Descotte wrote: > ... My goal is to make prediction on thousands of text entries, but with > smaller as possible learning datas (categories may often change so I will > not always have hundreds of entries for training on each category). > This is very small wi

SGD vs Naive Bayes for classification

2011-09-09 Thread Loic Descotte
Hello, First mail for me on Mahout ML :) I'm working on a classification problem and I'm trying to know which algorythm would be better for my needs. I've read that SGD is better than Naive Bayes for small-medium data sets. Does it mean that learning (train) data may be small or is it for sma