Baesyan filter - bug fixed (still test pre-release!)

Alexey N. Vinogradov Mon, 31 Mar 2003 20:54:35 -0800

Hello, tbdev.

One bug has been fixed in filter "baesyan".
The bug was that if a letter contain token consists whole from "!"
then during "degeneration" an error occured and the filter failed. So,
any letter includes this kind of tokens seemed to be "non-spam"
because of this fail.


Fixed version you can download here:

http://klirik.narod.ru/arc/baesnolog.tbp

http://klirik.narod.ru/arc/baes.tbp

(I still recommend you to use last (logged) version to send me a log
if any bug arises).

For this moment no other serious errors found.

In my own testing: since the first build I received 92 spam letters
and about 25 non-spam (understand now, why I began to write
the filter :). From these letters I has no false positives (i.e. none
of my good mail was accidentally deleted as spam) and 1 false negative
(i.e. one spam letter came to my mailbox). Also it were about 10 false
positives raised because of the just fixed bug. I refiltered these
letters after now and all of them were regarded as spam. So, total
effectivity (for the moment) is:

0%          (0 of 25) false positives and
1.1%        (1 of 92) false negative.

I use the regarding base of 650 spam and about 800 non-spam letters.

In future:

   1. New rbd-generating engine (principle is same, but will be
   changed user interface and some options added). Also it seems to be
   good to automatically recognize and do something with PGP- or
   S-MIME- encrypted messages - throw them at all or at least keep
   them as hash values due to reduce a dictionary.

   2. Filter settings will be stored in the registry. Or - I found
   that if "TBP_NeedConfig" returns -1 then The Bat! himself adds a
   section [Filterdata] in TBPlugin.INI. Now this section is empty but
   I think in future The Bat! developers will give a possibility to
   store a settings locally for every mailbox (in registry it will be
   global settings).

   3. Adapt rbd-generating to other mailbase formats - because as I
   know "SecureBat" is also exist and has his mailbases encrypted.
   This problem for this very program can be solved by other mailbase
   imported formats, for example, unix-mailbox.

   4. Self-training feature. Now I guess it can be like a question to
   a user after every 50 received letters (for example) with asking
   him to confirm the grade of all letters - or, as a case - to
   confirm only questionable letters automatically regarded in some
   definite interval of "spaminess" (21-80% for example). After that
   new grade will be appended to regard.rbd. So, the base will be
   always "fresh" and it wouldn't be necessary to use rbd-generating
   engine to refresh it.

   This is my own ideas. If anyone else has some?
   

-- 
Sincerely,
 Alexey.
Using TB 1.63b7 on WinXP SP1 Corp + MUI RU, spelling by ORFO2002
  mailto:[EMAIL PROTECTED]


________________________________________________
Current version is 1.62 | "Using TBDEV" information:
http://www.silverstones.com/thebat/TBUDLInfo.html

Baesyan filter - bug fixed (still test pre-release!)

Reply via email to