From Ramchandra's original message it sounds as if the corpus has only been trained with 'spam' messages. The Bayesian filter *needs* both spam AND ham (preferably in equal measure) before it can give sensible results.

Secondly, by today's standards the Bayesian code is a little naive and in particular it makes no attempt to decode base64 encoded content. This means if your spams contain a lot of images it adds a lot of random looking strings to the corpus which makes it more likely they will occur in ham messages.

Thirdly, it has no support for n-grams which means it has a very hard time analyzing UTF-8 rich emails like Chinese.

Regards,
David Legg

On 26/07/13 12:30, Eric Charles wrote:
Maybe your training is too wide.
What if you don't train, or only send a few mail for training? Does James also mark all mails as spam?

On 23/07/2013 13:27, Ramchandra Naik wrote:
Hi Guys,

We are using bayesian analysis feeder for spam feeding/filtering. After feeding spam in to it corpus get reloaded and then bayesian analysis mark my all incoming mails as a spam. Can you guys please look in to it and give me any solution.

I am using James Server 3.0-beta4 with MySQL and following is the configuration of bayesian analysis:

        <!-- "not spam" bayesian analysis feeder. -->
<mailet match="[email protected]" class="BayesianAnalysisFeeder">
<repositoryPath>db://maildb</repositoryPath>
           <feedType>ham</feedType>
           <maxSize>200000</maxSize>
        </mailet>

        <!-- "spam" bayesian analysis feeder. -->
<mailet match="[email protected]" class="BayesianAnalysisFeeder">
<repositoryPath>db://maildb</repositoryPath>
           <feedType>spam</feedType>
           <maxSize>200000</maxSize>
        </mailet>

        <!-- Anti spam bayesian analysis -->
<mailet match="All" class="BayesianAnalysis" onMailetException="ignore">
<repositoryPath>db://maildb</repositoryPath>
       <maxSize>200000</maxSize>
<headerName>X-MessageIsSpamProbability</headerName>
<ignoreLocalSender>true</ignoreLocalSender>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90" class="SetMailAttribute" onMatchException="noMatch">
           <isSpam>true</isSpam>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.90" class="SetMimeHeader" onMatchException="noMatch">
           <name>X-MessageIsSpam</name>
           <value>true</value>
        </mailet>

<mailet match="CompareNumericHeaderValue=X-MessageIsSpamProbability > 0.99" class="ToProcessor" onMatchException="noMatch">
           <processor>spam</processor>
           <notice>Spam not accepted</notice>
        </mailet>

<!-- Send remaining mails to the transport processor for either local or remote delivery -->
        <mailet match="All" class="ToProcessor">
           <processor>transport</processor>
        </mailet>
     </processor>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to