One question that I think it has not been answered yet is that of the negative simliarities. In literature you can find that similiarity=-1 means that "documents talk about opposite topics", but I think this is a quite abstract idea... I just ignore them, when I'm trying to find top-k similar documents these surely won't be useful. I read recently that this has to do with the assumptions in SVD which is designed for normal distributions (This implies the posibility of negative values). There are other techniques (Non-negative factorization) that tries to solve this. I don't know if there's something in mahout about this.
Best, Fernando. 2011/6/15 Ted Dunning <[email protected]> > The normal terminology is to name U and V in SVD as "singular vectors" as > opposed to eigenvectors. The term eigenvectors is normally reserved for > the > symmetric case of U S U' (more generally, the Hermitian case, but we only > support real values). > > On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <[email protected] > >wrote: > > > I beg to differ... U and V are left and right eigenvectors, and > > singular values is denoted as Sigma (which is a square root of eigen > > values of the AA' as you correctly pointed out) . > > >
