Re: how to create maildb?

Stefano Bagnara Sun, 19 Feb 2006 09:34:42 -0800

Andrew Sykes wrote:

Stefano,
E.g: with a basic "standard" corpus that consider v1agra spam and has noinformations about James being ham your message would have been deletedby my bayesian.
How does this differ from the following scenario...

1/ I turn on the filter
2/ I send a message with "v1agra" to the filter.
3/ You reply to this message


To "Train" the bayesian means you have to send it more than one message.

You should choose from your messages a sample of, for example, 100different messages you know are spam and 100 different messages you knoware not spam and then you should add "message tagging" by the bayesianmailets to understand if your training is going well.

In your 100 good messages there would be probably some other messagefrom this mailing list and the bayesian algorithm will try to understandwether this message is more likely a "viagra" spam message or a goodmessage.


My corpus for example has not marked this thread messages as spam.

Probably the "bayesian", "james", "message", "algorythm" words havebalanced the effect of the "v1agra" word.

You can safely run your tests if you start feeding your corpus andactivate the following one:

<mailet match="All" class="BayesianAnalysis"onMailetException="ignore">

            <repositoryPath>db://maildb</repositoryPath>
        <maxSize>200000</maxSize>
            <headerName>X-MessageIsSpamProbability</headerName>
            <ignoreLocalSender>true</ignoreLocalSender>
         </mailet>

<mailetmatch="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90"class="AddHeader" onMatchException="noMatch">

            <name>X-MessageIsSpam</name>
            <value>true</value>
         </mailet>

This way James will start adding an X-MessageIsSpamProbability header toyour messages and when this value is 0.9 or more it will add also an"X-MessageIsSpam: true" header. You then add a rule in your email clientand start looking what James put there. When james match false positivesyou should send the messages to the ham feeder to train it. When jamesfail to match spam you should send them to the spam feeder. Do thatuntil you are satisfied with your matching probability.


Stefano


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to create maildb?

Reply via email to