On Fri, May 13, 2005 at 06:53:28PM -0700, Steven Manross wrote:
> ***This now works (with minor mods to the SA distro files [SQL.pm] and
> the creation of an additional MS SQL User defined function)
> 
> I've mocked up an MS SQL Version of RPAD that could be easily introduced
> into the readme code that creates the bayes tables, and sets the
> version. (please correct the SQL for RPAD if I've incorrectly defined
> part of it). 
> 
> spamassassin -D <input.txt >output.txt
> 
> ...showed bayes activity and marked spam/ham accordingly.
> 
> The only problem now being is that when you call MS SQL RPAD, you need
> to do so, like so:
> 
> dbo.RPAD('this',5,' ')

If it was straight SQL (ie select token, spam_count, ham_count etc)
what would the token portion have to look like for MS SQL?

Something like:

select substring(token,1,len(token)) + replicate(' ',5-len(token)),
ham_count, spam_count etc etc

?

If so, it would simply be a matter (in 3.1 at least) of creating a
MSSQL.pm module that inherits from SQL.pm and overrides
_token_select_string.  Of course, you can still do that with the RPAD
function and make the call:
select dbo.RPAD(token,5,' '), spam_count, ham_count etc etc

Seems like a reasonable thing to do, and in the future we might find
some other MS SQL specific things we want to override to make things
faster.

FYI, to answer your question about why not just use varchar, we found
that creating variable length rows really slowed down the SQL, so best
to keep things a constant length, things move much faster that way.

Michael

Attachment: pgpmYYGikPOvp.pgp
Description: PGP signature

Reply via email to