On Fri, May 13, 2005 at 06:53:28PM -0700, Steven Manross wrote: > ***This now works (with minor mods to the SA distro files [SQL.pm] and > the creation of an additional MS SQL User defined function) > > I've mocked up an MS SQL Version of RPAD that could be easily introduced > into the readme code that creates the bayes tables, and sets the > version. (please correct the SQL for RPAD if I've incorrectly defined > part of it). > > spamassassin -D <input.txt >output.txt > > ...showed bayes activity and marked spam/ham accordingly. > > The only problem now being is that when you call MS SQL RPAD, you need > to do so, like so: > > dbo.RPAD('this',5,' ')
If it was straight SQL (ie select token, spam_count, ham_count etc) what would the token portion have to look like for MS SQL? Something like: select substring(token,1,len(token)) + replicate(' ',5-len(token)), ham_count, spam_count etc etc ? If so, it would simply be a matter (in 3.1 at least) of creating a MSSQL.pm module that inherits from SQL.pm and overrides _token_select_string. Of course, you can still do that with the RPAD function and make the call: select dbo.RPAD(token,5,' '), spam_count, ham_count etc etc Seems like a reasonable thing to do, and in the future we might find some other MS SQL specific things we want to override to make things faster. FYI, to answer your question about why not just use varchar, we found that creating variable length rows really slowed down the SQL, so best to keep things a constant length, things move much faster that way. Michael
pgpmYYGikPOvp.pgp
Description: PGP signature