Thanks a lot for your interest and time.
I'm computer-less for the coming week, but I'll run a few more
experiments and post the data as soon as I'm back home.
Thanks.
Benjamin
Le 17 sept. 2011 à 00:24, Ted Dunning ted.dunn...@gmail.com a écrit :
Benjamin,
Can you post your actual training
Hello,
I'm giving a try to different classifiers for a classical problem of text
classification very close to the 20newsgroup one.
I end up with much better results with Weka NaiveBayesMultinomial than with
Mahout bayes.
The main problem comes from the fact that my data is unbalanced. I know
Funny, I was just doing a similar thing using the ASF email archives. My
initial run, which tried to classify per mailing list gave pretty low
performance (61%), but then when I did per project, I got quite high
performance. In my case, I think there was too much overlap between mailing
Did you try complementary naive bayes(CNB). I am guessing the multinomial
naivebayes mentioned here is a CNB like implementation and not NB.
On Fri, Sep 16, 2011 at 5:30 PM, Benjamin Rey benjamin@c-optimal.comwrote:
Hello,
I'm giving a try to different classifiers for a classical problem
Unfortunately CNB gives me the same 66% accuracy.
I past the commands for mahout and weka below.
I also tried to remove the biggest class, it helps but then it's the 2nd
biggest class that is overwhelmingly predicted. Mahout bayes seems to favor
a lot the biggest class (more than prior),
Benjamin,
Can you post your actual training data on dropbox or some other place so
that we can replicate the problem?
On Fri, Sep 16, 2011 at 3:38 PM, Benjamin Rey benjamin@c-optimal.comwrote:
Unfortunately CNB gives me the same 66% accuracy.
I past the commands for mahout and weka