Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-17 Thread Benjamin Rey
Thanks a lot for your interest and time. I'm computer-less for the coming week, but I'll run a few more experiments and post the data as soon as I'm back home. Thanks. Benjamin Le 17 sept. 2011 à 00:24, Ted Dunning ted.dunn...@gmail.com a écrit : Benjamin, Can you post your actual training

92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Benjamin Rey
Hello, I'm giving a try to different classifiers for a classical problem of text classification very close to the 20newsgroup one. I end up with much better results with Weka NaiveBayesMultinomial than with Mahout bayes. The main problem comes from the fact that my data is unbalanced. I know

Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Grant Ingersoll
Funny, I was just doing a similar thing using the ASF email archives. My initial run, which tried to classify per mailing list gave pretty low performance (61%), but then when I did per project, I got quite high performance. In my case, I think there was too much overlap between mailing

Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Robin Anil
Did you try complementary naive bayes(CNB). I am guessing the multinomial naivebayes mentioned here is a CNB like implementation and not NB. On Fri, Sep 16, 2011 at 5:30 PM, Benjamin Rey benjamin@c-optimal.comwrote: Hello, I'm giving a try to different classifiers for a classical problem

Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Benjamin Rey
Unfortunately CNB gives me the same 66% accuracy. I past the commands for mahout and weka below. I also tried to remove the biggest class, it helps but then it's the 2nd biggest class that is overwhelmingly predicted. Mahout bayes seems to favor a lot the biggest class (more than prior),

Re: 92% accuracy on Weka NaiveBayesMultinomial vs 66% with Mahout bayes

2011-09-16 Thread Ted Dunning
Benjamin, Can you post your actual training data on dropbox or some other place so that we can replicate the problem? On Fri, Sep 16, 2011 at 3:38 PM, Benjamin Rey benjamin@c-optimal.comwrote: Unfortunately CNB gives me the same 66% accuracy. I past the commands for mahout and weka