On Wed, Jun 15, 2011 at 10:06 AM, Stefan Wienert <ste...@wienert.cc> wrote:
> Hmm. Seems I have plenty of negative results (nearly half of the > similarity). I can add +0.3 then the greatest negative results are > near 0. This is not optimal... > I can project the results to [0..1]. > Looking for *dissimilar* results seems odd. What are you trying to do? What people normally do is look for clusters of similar documents, or just the top-N most similar documents to each document. In both of these cases, you don't care about the documents whose similarity to anyone is zero, or less than zero. -jake > Any other suggestions or comments? > > Cheers > Stefan > > 2011/6/15 Jake Mannix <jake.man...@gmail.com>: > > While your original vectors never had similarity less than zero, after > > projection onto the SVD space, you may "project away" similarities > > between two vectors, and they are now negatively correlated in this > > space (think about projecting (1,0,1) and (0,1,1) onto the 1-d vector > > space spanned by (1,-1,0) - they go from having similarity +1/sqrt(2) > > to similarity -1). > > > > I always interpret all similarities <= 0 as "maximally dissimilar", > > even if technically -1 is where this is exactly true. > > > > -jake > > > > On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <ste...@wienert.cc> > wrote: > > > >> Ignoring is no option... so I have to interpret these values. > >> Can one say that documents with similarity = -1 are the less similar > >> documents? I don't think this is right. > >> Any other assumptions? > >> > >> 2011/6/15 Fernando Fernández <fernando.fernandez.gonza...@gmail.com>: > >> > One question that I think it has not been answered yet is that of the > >> > negative simliarities. In literature you can find that similiarity=-1 > >> means > >> > that "documents talk about opposite topics", but I think this is a > quite > >> > abstract idea... I just ignore them, when I'm trying to find top-k > >> similar > >> > documents these surely won't be useful. I read recently that this has > to > >> do > >> > with the assumptions in SVD which is designed for normal distributions > >> (This > >> > implies the posibility of negative values). There are other techniques > >> > (Non-negative factorization) that tries to solve this. I don't know if > >> > there's something in mahout about this. > >> > > >> > Best, > >> > > >> > Fernando. > >> > > >> > 2011/6/15 Ted Dunning <ted.dunn...@gmail.com> > >> > > >> >> The normal terminology is to name U and V in SVD as "singular > vectors" > >> as > >> >> opposed to eigenvectors. The term eigenvectors is normally reserved > for > >> >> the > >> >> symmetric case of U S U' (more generally, the Hermitian case, but we > >> only > >> >> support real values). > >> >> > >> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov < > dlie...@gmail.com > >> >> >wrote: > >> >> > >> >> > I beg to differ... U and V are left and right eigenvectors, and > >> >> > singular values is denoted as Sigma (which is a square root of > eigen > >> >> > values of the AA' as you correctly pointed out) . > >> >> > > >> >> > >> > > >> > >> > >> > >> -- > >> Stefan Wienert > >> > >> http://www.wienert.cc > >> ste...@wienert.cc > >> > >> Telefon: +495251-2026838 > >> Mobil: +49176-40170270 > >> > > > > > > -- > Stefan Wienert > > http://www.wienert.cc > ste...@wienert.cc > > Telefon: +495251-2026838 > Mobil: +49176-40170270 >