Andrew Sykes wrote:
Stefano,
E.g: with a basic "standard" corpus that consider v1agra spam and has no
informations about James being ham your message would have been deleted
by my bayesian.
How does this differ from the following scenario...
1/ I turn on the filter
2/ I send a message with "v1agra" to the filter.
3/ You reply to this message
To "Train" the bayesian means you have to send it more than one message.
You should choose from your messages a sample of, for example, 100
different messages you know are spam and 100 different messages you know
are not spam and then you should add "message tagging" by the bayesian
mailets to understand if your training is going well.
In your 100 good messages there would be probably some other message
from this mailing list and the bayesian algorithm will try to understand
wether this message is more likely a "viagra" spam message or a good
message.
My corpus for example has not marked this thread messages as spam.
Probably the "bayesian", "james", "message", "algorythm" words have
balanced the effect of the "v1agra" word.
You can safely run your tests if you start feeding your corpus and
activate the following one:
<mailet match="All" class="BayesianAnalysis"
onMailetException="ignore">
<repositoryPath>db://maildb</repositoryPath>
<maxSize>200000</maxSize>
<headerName>X-MessageIsSpamProbability</headerName>
<ignoreLocalSender>true</ignoreLocalSender>
</mailet>
<mailet
match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90"
class="AddHeader" onMatchException="noMatch">
<name>X-MessageIsSpam</name>
<value>true</value>
</mailet>
This way James will start adding an X-MessageIsSpamProbability header to
your messages and when this value is 0.9 or more it will add also an
"X-MessageIsSpam: true" header. You then add a rule in your email client
and start looking what James put there. When james match false positives
you should send the messages to the ham feeder to train it. When james
fail to match spam you should send them to the spam feeder. Do that
until you are satisfied with your matching probability.
Stefano
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]