On Mon, 22 Aug 2016 09:03:38 -0700
Marc Perkel <supp...@junkemailfilter.com> wrote:
The ones that are the same are of no interest. Only where it matches
one side and not the other.

On 08/22/16 09:06, Dianne Skoll wrote:
But... but... that's exactly like Bayes if you throw out tokens whose
observed probability is not 0 or 1.

Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
and that's why you get good results.  Multiword Bayes is just as good,
and I know that from experience.

On 22.08.16 10:44, Marc Perkel wrote:
This is nothing like bayes. Bayes is creating a mental block.

This is just like bayes.
There are (only) a few differences between what you describe and bayes as
implemented in SA, but it's still bayes-based.

When I describe it to people who don't know bayes they immediately get it. If I describe it to people who know bayes - they confuse it. Bayes is a probability spectrum based on a frequency match on both sets. That's not even close to what I'm doing.

Bayes uses probabilities between 0 and 1, while you only accept 0 and 1.
You have just tweaked bayes, and I'm not even sure if towards better
detection (i believe, towards worse)

Also - some of what I'm doing is all combinations, not just sequential. So it's like a system that writes and scores it's own rules. I just throw data at it and it classifies it.

The main difference between bayes as implemented in SA is that you make
multiword tokens.  This is good, but you aren't even first one who proposed
or did that.  The second main difference is in the point above.

The real magic is the feedback learning. So as it identifies ham it learns new words and phrases that then match email from other people. So it learns how normal people speak, it learns how spammers speak, and it identifies the DIFFERENCES between the two. And it's completely automated.

This it just the same as SA bayas with autolearning. However it will suffer
the same issues and thus will require learning by other sources, either
manual or other SA rules.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Reply via email to