Hello Daniel, <snipped a bit> > So my real question, after all this, is what have you found to be the > most appropriate bucket setup? Should I artificially create more > buckets, or leave it at the minimum.. or am i missing the point > entirely? or.........
I don't really know what *is* better, but I'll tell you what I did. For some three days I had 4 buckets and the overall accuracy didn't go much higher than 80%. Then I thought that the more buckets the longer the training period would probably need to be and, what the heck!, I'm only really interested in classifying spam from no-spam. So, as actually two of the buckets were sub-sets of a third one, I deleted the to sub-set ones and reset statistics and, after one more day, accuracy was over 90%. Aside of that I see no sense in using a Bayesian text classifier (that is what POFile actually is) to tell me if a message is coming from John Doe or from TBUDL list, both are legitimate mail and that kind of sorting is better and more easily done by TB's *Sorting* Office. > Also, am I just lucky, or over the last, say, month or so, has spam > really slowed down. The last spam i received through my main account > must have been early last week, and I haven't received any spam in > either of my hotmail accounts for weeks either... Just lucky, or are > other people noticing a similar occurrence... I wouldn't know. I get spam everyday, specially in two of my accounts, and even just one spam message seems too much for me :-( -- Best regards, Miguel A. Urech (El Escorial - Spain) Using The Bat! v1.61 ________________________________________________ Current version is 1.62 | "Using TBUDL" information: http://www.silverstones.com/thebat/TBUDLInfo.html