On Tue, 2011-11-22 at 01:47 +0100, Jesper Wallin wrote: > On 11/22/2011 12:35 AM, Karsten Bräckelmann wrote:
> > > I also noticed that my old database only had 11k tokens while the new > > > one got about 60k (both the old and new server has hapaxes enabled and > > > was trained using a corpus of about 600 spam and 200 ham) > > > > Is that "old" database the original one from the previous system, or old > > as in "before learning from scratch", but *after* migrating the db? > > > > I'd guess the latter. 11k tokens is terribly low, and as you just > > noticed even less than learning a handful from scratch. > > I meant the original database, created by SA 3.3.2.. It got about 11k > tokens. Also, it runs MySQL 5.5.17 (as that machine runs ArchLinux) and > I'm not sure about the last comment on the MySQL bug page, it doesn't > really say if it's fixed or not in 5.5.16. Your Ubuntu system uses 5.1, though. Anyway, I guess to ever find out if this might be the issue, Mark or someone else needs to come up with some funky idea. And regardless, 11k tokens is terribly low. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}