On Sat, 28 May 2016, Bill Cole wrote:

There is sound statistical theory consistent with empirical evidence underpinning the Bayes classifier implementation in SA. While there can be legitimate critiques of the SA implementation specifically and in general how well email word frequency fits Bayes' Theorem, injecting a pile of new derivative meta-tokens based on pre-conceived notions of "concepts" into the Bayesian analysis invalidates the assumption of what the input for Naive Bayes analysis is: *independent* features. The "concepts" approach adds words that are *dependent* on the presence of other words in the document and to make it worse, those dependent words may already exist in some pristine messages. It unmoors the SA Bayes implementation from any theoretical grounding, converting its complex math from statistical analysis into arbitrary numerology.

Based on that, do you have an opinion on the proposal to add two-word (or configurable-length) combinations to Bayes?

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Gun Control is marketed to the public using the appealing delusion
  that violent criminals will obey the law.
-----------------------------------------------------------------------
 2 days until Memorial Day - honor those who sacrificed for our liberty

Reply via email to