Re: tf-idf + svd + cosine similarity

Fernando Fernández Wed, 15 Jun 2011 03:58:13 -0700

I think that LanczosSolver provides negative values as well, I don't know
about SSVD.

I guess that if similarity has a high negative value, you can say that
documents talk about things that almost never appear together in the same
text (if term A appears, then term B won't appear), but I think this is
almost impossible in practice (at least the most extreme case with
similiarity=-1), as there are always common expressions that appear in many
documents. I think that's why avg(similiarity) is always above 0 in your
case.

2011/6/15 Sean Owen <sro...@gmail.com>

> The features all take on non-negative values here, right?
> Then the cosine can't be negative.
>
> In another context, where features could be negative, cosine could
> indeed be negative. -1 means most dissimilar of all -- the feature
> vectors are exactly opposed.
>
> On Wed, Jun 15, 2011 at 10:10 AM, Stefan Wienert <ste...@wienert.cc>
> wrote:
> > Ignoring is no option... so I have to interpret these values.
> > Can one say that documents with similarity = -1 are the less similar
> > documents? I don't think this is right.
> > Any other assumptions?
>

Re: tf-idf + svd + cosine similarity

Reply via email to