Thank you Matt =). So most of the heuristics seem to be looking for SPAM. What are the ones that would push a mail towards being HAM (and that are not ignored by autolearn bayes)? So far I have found one: ALL_TRUSTED.
A few network tests also qualify:
RCVD_IN_BSP_OTHER RCVD_IN_BSP_TRUSTED HABEAS_USER
Other than that, SA is just relying on not hitting many spam rules. The default autolearn threshold is slightly positive for this reason...
I myself use a slightly negative autolearn threshold, and I have a bunch of custom rules with small (no less than -0.2) negative scores that help place mail into my ham autolearning.. Even so, ham autolearning is quite a bit less frequent than spam autolearning...
# grep "autolearn=spam" /var/log/maillog |wc -l
9030
# grep "autolearn=not spam" /var/log/maillog |wc -l
478
(note: I use MailScanner which uses this log format... IRC, normal SA logs as "ham" instead of "not spam")