I think that LanczosSolver provides negative values as well, I don't know about SSVD.
I guess that if similarity has a high negative value, you can say that documents talk about things that almost never appear together in the same text (if term A appears, then term B won't appear), but I think this is almost impossible in practice (at least the most extreme case with similiarity=-1), as there are always common expressions that appear in many documents. I think that's why avg(similiarity) is always above 0 in your case. 2011/6/15 Sean Owen <sro...@gmail.com> > The features all take on non-negative values here, right? > Then the cosine can't be negative. > > In another context, where features could be negative, cosine could > indeed be negative. -1 means most dissimilar of all -- the feature > vectors are exactly opposed. > > On Wed, Jun 15, 2011 at 10:10 AM, Stefan Wienert <ste...@wienert.cc> > wrote: > > Ignoring is no option... so I have to interpret these values. > > Can one say that documents with similarity = -1 are the less similar > > documents? I don't think this is right. > > Any other assumptions? >