On Aug 22, 2016, at 10:44 AM, Marc Perkel 
<supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com>> wrote:



On 08/22/16 09:06, Dianne Skoll wrote:
On Mon, 22 Aug 2016 09:03:38 -0700
Marc Perkel <supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com>> 
wrote:

The ones that are the same are of no interest. Only where it matches
one side and not the other.
But... but... that's exactly like Bayes if you throw out tokens whose
observed probability is not 0 or 1.

Also, in your list of tokens, they are all phrases ranging from 1 to 4 words,
and that's why you get good results.  Multiword Bayes is just as good,
and I know that from experience.



This is nothing like bayes. Bayes is creating a mental block. When I describe 
it to people who don't know bayes they immediately get it. If I describe it to 
people who know bayes - they confuse it. Bayes is a probability spectrum based 
on a frequency match on both sets. That's not even close to what I'm doing.


I think you've copied and pasted this same paragraph half a dozen times now, 
and the list has tried it's best to accommodate your statement about "Bayes is 
creating a mental block", asking you pertinent questions that either remained 
un-answered, and/or when answered provided conflicting statements, and when 
pressed ended up showing that what you are doing is (at best) a slightly 
modified version.

However, I find the statement "When I describe it to people who don't know 
bayes they immediately get it" the most telling of them all. Of course people 
who don't know the probability theory will look at what you are doing and go 
"Wow!!! This is great!!" BECAUSE THEY DON'T KNOW.

People who know, obviously, recognize it for what it is, and you can claim as 
much as you like it's NOT, but at the end of they day, if it looks like a rose, 
smells like a rose (no matter what you call it) tis still rose!

All you have to do is READ the Process section of the following link to see 
exactly how similar your explanation is (save one factor which is using phrases 
vs. words), which has already been explained as a feature of SA using 
multi-word tokens:
https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering



Also - some of what I'm doing is all combinations, not just sequential. So it's 
like a system that writes and scores it's own rules. I just throw data at it 
and it classifies it.

The real magic is the feedback learning. So as it identifies ham it learns new 
words and phrases that then match email from other people. So it learns how 
normal people speak, it learns how spammers speak, and it identifies the 
DIFFERENCES between the two. And it's completely automated.


--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com<mailto:supp...@junkemailfilter.com>
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400


Reply via email to