Ignoring is no option... so I have to interpret these values. Can one say that documents with similarity = -1 are the less similar documents? I don't think this is right. Any other assumptions?
2011/6/15 Fernando Fernández <[email protected]>: > One question that I think it has not been answered yet is that of the > negative simliarities. In literature you can find that similiarity=-1 means > that "documents talk about opposite topics", but I think this is a quite > abstract idea... I just ignore them, when I'm trying to find top-k similar > documents these surely won't be useful. I read recently that this has to do > with the assumptions in SVD which is designed for normal distributions (This > implies the posibility of negative values). There are other techniques > (Non-negative factorization) that tries to solve this. I don't know if > there's something in mahout about this. > > Best, > > Fernando. > > 2011/6/15 Ted Dunning <[email protected]> > >> The normal terminology is to name U and V in SVD as "singular vectors" as >> opposed to eigenvectors. The term eigenvectors is normally reserved for >> the >> symmetric case of U S U' (more generally, the Hermitian case, but we only >> support real values). >> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <[email protected] >> >wrote: >> >> > I beg to differ... U and V are left and right eigenvectors, and >> > singular values is denoted as Sigma (which is a square root of eigen >> > values of the AA' as you correctly pointed out) . >> > >> > -- Stefan Wienert http://www.wienert.cc [email protected] Telefon: +495251-2026838 Mobil: +49176-40170270
