On 12/10/2015 10:54 PM, Marc Perkel wrote:
I've had bayes disabled in SA because it seems to not be able to stay
working in a high volume situation. The MySQL server can't seem to keep
up with it even on very fast computers.

Redis is your friend.
Redis over the wire is faster than any local SDBM/DB file based backend.
All you need is ram, the more the better

I use site wide autolearn, auto feed spam from traps - atm, token TTL is 4d

# Clients
connected_clients:35
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:3454088112
used_memory_human:3.22G
used_memory_rss:3528310784
used_memory_peak:3454661016
used_memory_peak_human:3.22G
used_memory_lua:116736
mem_fragmentation_ratio:1.02
mem_allocator:jemalloc-3.6.0

if I switch Bayes on or off I notice zero SA scan speed change.
Average SA scan time  is 0.8 sec/msg


But - thinking about trying something interesting - doing my own bayes
in a different way.

Here's my question.

Bayes breaks the message down into some sort of tokens and then does
statistics on those tokens as to tokens found in spam vs. tokens found
in ham.

But what about combinations of tokens? I'm thinking that I'd like to
have something that says when it sees tokens X and Y and Z then that's
spam even though X,Y,Z might be in ham when not combined.

Does bayes do that or is there anything that does?

There's tons of Bayes documentation on the net. Different implementations, etc.
Enjoy the math... Not really a pure SA topic.




Reply via email to