Hi Joe,
Digging in DEEPER yields the following;
1. mailet log says there is no mail server named hostname.myfirst.domain so
I change my config.xml to just [email protected] for bayesian analysis. (
What should I be forwarding my spam to???
DO I need to purchase another domain for this to work?
No you don't need to purchase another domain.
The 'trick' is to ensure that you have SMTP Authentication enabled in
James. Then in your config file you set up the root pipeline such that
any email sent by an authenticated user to your special spam or ham
address gets passed to the Bayesian analysis feeder. If you do this
then the ham or spam addresses can be anything you like. Indeed, it is
preferable to define some nonsense addresses so that only authenticated
users can update the corpus (you don't want spammers to poison your
corpus by sending non-spam to your spam address... rare though that
might be). I'll show you how to do this in a minute after I've answered
your other questions.
2. All of the tables except deadletter are empty.
Yes. This indicates that no spam or ham is being added to the corpus.
3. Sending to [email protected] - I see "Corpus loaded" message in
mailet.log, but no entry in db - bayesian.._spam table.
This message doesn't mean a spam or ham message tried to be added. It
simply means the Bayesian filter mailet loaded the empty corpus. It is
good news that the corpus was loaded. It is bad news that we know it is
empty!
4. When I forward to [email protected] - The spam ends up in
file://var/mail/address-error/ and not in database.
This also points to the fact that something is wrong with adding spam or
ham to the corpus. Nothing is picking up emails and sending then to the
Bayesian feeder.
5. SHould my config.xml file include a complete table name in the
repositoryPath? It is currently db://maildb WITHOUT any table name,.
i.e. bayesiananalysis_spam.... How does it know this? Is it hardcoded in
the Class ??
The table names are hard-coded I believe so just specifying
'db://maildb' is sufficient.
Ok... so that's the theory and questions out of the way...
Now in your case according to the config.xml file you published for us
you have the following in your root processing pipeline: -
<mailet match="[email protected]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>ham</feedType>
<maxSize>500000</maxSize>
</mailet>
<mailet match="[email protected]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>spam</feedType>
<maxSize>500000</maxSize>
</mailet>
Can you see the problem ;-) ?
According to this you should send your ham messages to
'[email protected]' and your spam messages to... exactly the
same address (assuming this is not an error you made in editing the file
to remove sensitive data).
For comparison here's the same section in my config.xml: -
<!-- "not spam" bayesian analysis feeder. -->
<mailet match="[email protected]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>ham</feedType>
<maxSize>500000</maxSize>
</mailet>
<!-- "spam" bayesian analysis feeder. -->
<mailet match="[email protected]"
class="BayesianAnalysisFeeder">
<repositoryPath> db://maildb </repositoryPath>
<feedType>spam</feedType>
<maxSize>500000</maxSize>
</mailet>
I send all my spam messages to '[email protected]' and ham messages to
'[email protected]'. I have not sanitized these addresses.... they are
the real actual addresses. I can do this because my mail client
(Thunderbird in my case) is set up to use my James SMTP server to send
this email and therefore my mail client has to log into the server and
thus is SMTP authenticated. The Bayesian feeder extracts this email
before it gets too far down the pipeline so James won't attempt to
actually send the email to an address in the xxx.yyy domain.
Hope this helps.
Regards,
David Legg
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]