On 2015-12-02 09:14, Sebastian Arcus wrote:
Perfect - that's exactly the sort of real-life based advice I was looking for. Many thanks!

I run a small shared hosting environment, with a global bayes for all users as not enough users are ready/willing/able to take the time to sort ham (although more will press "this is spam") and in general, the results work out well enough.

Sharing bayes between servers or sites would not seem to be particularly different than a shared bayes between multiple customers in a shared hosting, as long as the "typical end user" is similar. If you have a viagra dealer or diet pill retailer as one of your customers, your mileage may vary and they may need more personalization, but in general, for typical SOHO and SMB customers, spammy spam is spammy spam and pretty widely distributed.

From what I see, it's ham that varies a lot per-user, and so while we try to train bayes across a wide range of ham sets, we also do a lot of automated whitelisting based on user behaviour based on mail that users send, or mail that users keep in their mailboxes so that we can skip spam filtering entirely for as much "wanted" mail as possible. We also try to reduce filtering on replies based on the "In-Reply-To:" header containing headers that match certain formats (such as what our webmail produces, what we add to messages missing this header, and a few other formats), so it's possible that someone else who borrowed our bayes database might end up seeing a higher false-positive rate.

We avoid training big companies (Amazon, eBay, etc) as spam even when they spam, as long as it's clearly identified in a blockable way, instead providing users the ability to block senders outright when applicable.

Sure, there are errors and mistakes, by and large, bayes works out the details in a shared environment, a multi-server environment shouldn't be too different, as long as the customer base is similar.

--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren


Reply via email to