On 2024-06-25 at 17:38:28 UTC-0400 (Tue, 25 Jun 2024 17:38:28 -0400)
Mark London <m...@psfc.mit.edu>
is rumored to have said:

Bill - Thanks for the response.  As an aside, it would be nice (though impossible?) for a spam filter to be more suspicious of emails coming from a new email address, that is not in my Sent folder or my Inbox. FWIW. - Mark

Matija's mention of AWL/TxRep is correct here. While some people find it a nuisance when it makes one FP into an ongoing series, I think it is worth enabling for most sites.

However, if you do enable either of those tools, you should have a mechanism for feeding FPs into both a sitewide Bayes DB and into the AWL/TxRep DB by using the blocklist/welcomelist options of the spamassassin script.



On 6/25/2024 11:21 AM, Bill Cole wrote:
Mark London <m...@psfc.mit.edu>
is rumored to have said:

I received a spam email with the text below, that wasn't caught by Spamassasin (at least mine).   The text actually looks like something that was generated using ChatGPT.  In any event,  I put the text through ChatGPT, and asked if it looked like spam.  At the bottom of this email , is it's analysis.  I've not been fully reading this group.  Has there been any work to allow Spamassassin to use AI?

"Artificial intelligence" does not exist. It is a misnomer.

Large language models like ChatGPT have a provenance problem. There's no way to know why exactly the model "says" anything. In a single paragraph, ChatGPT is capable of making completely and directly inconsistent assertions. The only way to explain that is that despite appearances, a request to answer the ham/spasm question generates text with no semantic connection to the original, but which seems like an explanation.

SpamAssassin's code and rules all come from ASF committers, and the scores are determined by examining the scan results from contributors and optimizing them to a threshold of 5.0. Every scan of a message results in a list of hits against documented rules. The results can be analyzed and understood.

We know that ChatGPT and other LLMs that are publicly available have been trained on data to which they had no license. There is no way to remove any particular ingested data. There's no way to know where any particular LLM will have problems and no way to fix those problems. This all puts them outside of the boundaries we have as an ASF project. However, we do have a plugin architecture, so it is possible for 3rd parties to create a plugin for LLM integration.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo@toad.social and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Reply via email to