Hmm. Seems I have plenty of negative results (nearly half of the similarity). I can add +0.3 then the greatest negative results are near 0. This is not optimal... I can project the results to [0..1]. Any other suggestions or comments?
Cheers Stefan 2011/6/15 Jake Mannix <jake.man...@gmail.com>: > While your original vectors never had similarity less than zero, after > projection onto the SVD space, you may "project away" similarities > between two vectors, and they are now negatively correlated in this > space (think about projecting (1,0,1) and (0,1,1) onto the 1-d vector > space spanned by (1,-1,0) - they go from having similarity +1/sqrt(2) > to similarity -1). > > I always interpret all similarities <= 0 as "maximally dissimilar", > even if technically -1 is where this is exactly true. > > -jake > > On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <ste...@wienert.cc> wrote: > >> Ignoring is no option... so I have to interpret these values. >> Can one say that documents with similarity = -1 are the less similar >> documents? I don't think this is right. >> Any other assumptions? >> >> 2011/6/15 Fernando Fernández <fernando.fernandez.gonza...@gmail.com>: >> > One question that I think it has not been answered yet is that of the >> > negative simliarities. In literature you can find that similiarity=-1 >> means >> > that "documents talk about opposite topics", but I think this is a quite >> > abstract idea... I just ignore them, when I'm trying to find top-k >> similar >> > documents these surely won't be useful. I read recently that this has to >> do >> > with the assumptions in SVD which is designed for normal distributions >> (This >> > implies the posibility of negative values). There are other techniques >> > (Non-negative factorization) that tries to solve this. I don't know if >> > there's something in mahout about this. >> > >> > Best, >> > >> > Fernando. >> > >> > 2011/6/15 Ted Dunning <ted.dunn...@gmail.com> >> > >> >> The normal terminology is to name U and V in SVD as "singular vectors" >> as >> >> opposed to eigenvectors. The term eigenvectors is normally reserved for >> >> the >> >> symmetric case of U S U' (more generally, the Hermitian case, but we >> only >> >> support real values). >> >> >> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlie...@gmail.com >> >> >wrote: >> >> >> >> > I beg to differ... U and V are left and right eigenvectors, and >> >> > singular values is denoted as Sigma (which is a square root of eigen >> >> > values of the AA' as you correctly pointed out) . >> >> > >> >> >> > >> >> >> >> -- >> Stefan Wienert >> >> http://www.wienert.cc >> ste...@wienert.cc >> >> Telefon: +495251-2026838 >> Mobil: +49176-40170270 >> > -- Stefan Wienert http://www.wienert.cc ste...@wienert.cc Telefon: +495251-2026838 Mobil: +49176-40170270