Andrew Sykes wrote:
Stefano,

E.g: with a basic "standard" corpus that consider v1agra spam and has no informations about James being ham your message would have been deleted by my bayesian.

How does this differ from the following scenario...

1/ I turn on the filter
2/ I send a message with "v1agra" to the filter.
3/ You reply to this message

To "Train" the bayesian means you have to send it more than one message.
You should choose from your messages a sample of, for example, 100 different messages you know are spam and 100 different messages you know are not spam and then you should add "message tagging" by the bayesian mailets to understand if your training is going well.

In your 100 good messages there would be probably some other message from this mailing list and the bayesian algorithm will try to understand wether this message is more likely a "viagra" spam message or a good message.

My corpus for example has not marked this thread messages as spam.
Probably the "bayesian", "james", "message", "algorythm" words have balanced the effect of the "v1agra" word.

You can safely run your tests if you start feeding your corpus and activate the following one:

<mailet match="All" class="BayesianAnalysis" onMailetException="ignore">
            <repositoryPath>db://maildb</repositoryPath>
        <maxSize>200000</maxSize>
            <headerName>X-MessageIsSpamProbability</headerName>
            <ignoreLocalSender>true</ignoreLocalSender>
         </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90" class="AddHeader" onMatchException="noMatch">
            <name>X-MessageIsSpam</name>
            <value>true</value>
         </mailet>

This way James will start adding an X-MessageIsSpamProbability header to your messages and when this value is 0.9 or more it will add also an "X-MessageIsSpam: true" header. You then add a rule in your email client and start looking what James put there. When james match false positives you should send the messages to the ham feeder to train it. When james fail to match spam you should send them to the spam feeder. Do that until you are satisfied with your matching probability.

Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to