On Thu, 15 Feb 2018 00:01:18 +0100 Reindl Harald wrote: > Am 14.02.2018 um 23:07 schrieb RW: > > My point is that an imbalance doesn't create a bias > wrong - what you tried to say was "doesn't necessarily create a bias" > - but in fact when the imbalance is too big *it does* > > simply think about how bayes works makes that clear: eahc word a > token with ham/spam counter - when you have 1 Mio of one type and > 10000 of the other type guess how that counter start to get biased
As I said, Bayes is based on frequencies. If a token occurs in 10% of ham and 0.5% of spam based on 10,000 hams and 10,000 spams, what do you think is likely to happen to those percentages with 10,000 hams and 1,000,000 spams?