Re: Overtraining effects in NB

2011-10-25 Thread Grant Ingersoll
Robin, Any luck with this? On Oct 11, 2011, at 7:22 AM, Robin Anil wrote: > I am guessing this is on the new naivebayes package. I would like to check > the data and compare against the old implementation if its a bug. > > On Tue, Oct 11, 2011 at 4:18 PM, Grant Ingersoll wrote: > >> >> On Oct

Re: Overtraining effects in NB

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 7:22 AM, Robin Anil wrote: > I am guessing this is on the new naivebayes package. I would like to check > the data and compare against the old implementation if its a bug. Yes, it is. The data is up on Amazon as a public data set, otherwise just grab a few ASF mail archives

Re: Overtraining effects in NB

2011-10-11 Thread Robin Anil
I am guessing this is on the new naivebayes package. I would like to check the data and compare against the old implementation if its a bug. On Tue, Oct 11, 2011 at 4:18 PM, Grant Ingersoll wrote: > > On Oct 11, 2011, at 1:47 AM, Robin Anil wrote: > > > Could be due to the way normalization is do

Re: Overtraining effects in NB

2011-10-11 Thread Grant Ingersoll
On Oct 11, 2011, at 1:47 AM, Robin Anil wrote: > Could be due to the way normalization is done. In what part of the process? > How is CNB performing? It's better, like 40% correct, 60% wrong, but still not good. > Do > share the confusion matrices and per label precision. Usually on the orde

Re: Overtraining effects in NB

2011-10-10 Thread Robin Anil
Could be due to the way normalization is done. How is CNB performing? Do share the confusion matrices and per label precision. On Mon, Oct 10, 2011 at 11:20 PM, Grant Ingersoll wrote: > I was trying the Naive Bayes classifier via the build-asf-email.sh file I > committed the other day on a data s

Overtraining effects in NB

2011-10-10 Thread Grant Ingersoll
I was trying the Naive Bayes classifier via the build-asf-email.sh file I committed the other day on a data set that had a fairly significant variation in the number of messages per training label and am noticing (still need to validate more) that the label with the least number of examples is o