Whao, thanks a lot, it seems very interesting. What you suggested
means to weight each single words differently when I apply the cosine
similarity. Each weight is the frequency of the word in the seed
documents. It is not clear to me how to compute and use the
anomalously common cooccurrences, but I'll investigate.
Thanks a lot
Marco
On 20 Jul 2011, at 20:36, Ted Dunning wrote:
frequency weighted cosine distance