Title: Re: Bayes question
> So, what happens when you take these two overlapping databases and
> combine them is that certain tokens (those that have overlap) are then
> double counted.  This makes the database, at least according to the
> bayes model SA is using, statistically invalid.

Using this reasoning, the tokens that overlap are going to be identified as being related to the same message based on the same hashes. Therfore it should be possible to detect the tokens that are being double counted, and to dismiss them when they do.

If you can do this then surely the database remains statistically correct and can be safely merged?



---------------------------------------------------
This email from dns has been validated by dnsMSS Managed Email Security and is free from all known viruses.

For further information contact [EMAIL PROTECTED]



Reply via email to