Re: Can all the algorithms in Mahout be run locally without a Hadoop cluster.

2011-06-24 Thread Stefan Wienert
Typically, they are all standalone... 2011/6/24 XiaoboGu : > We will put a big SMP server to deploy Mahout. > > Regards, > > Xiaobo Gu > > -- Stefan Wienert http://www.wienert.cc ste...@wienert.cc Telefon: +495251-2026838 Mobil: +49176-40170270

Re: tf-idf + svd + cosine similarity

2011-06-15 Thread Stefan Wienert
ned by (1,-1,0) - they go from having similarity +1/sqrt(2) > to similarity -1). > > I always interpret all similarities <= 0 as "maximally dissimilar", > even if technically -1 is where this is exactly true. > >  -jake > > On Wed, Jun 15, 2011 at 2:10 AM, Ste

Re: tf-idf + svd + cosine similarity

2011-06-15 Thread Stefan Wienert
>> symmetric case of U S U'  (more generally, the Hermitian case, but we only >> support real values). >> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov > >wrote: >> >> > I beg to differ... U and V are left and right eigenvectors, and >&

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
e similarities may be different. I assume > you used normalized (true) eigenvectors from ssvd. > > Also would be interesting to know what oversampling parameter you (p) you > used. > > Thanks. > -d > > > On Tue, Jun 14, 2011 at 2:04 PM, Stefan Wienert wrote: >> So.

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
one last question: for cosine similarity, sometimes the results are negative (which means angel between vectors is greater than 90°). but what does this means for the similarity? Cheers, Stefan 2011/6/14 Stefan Wienert : > So... lets check the dimensions: > > First step: Lucene Outp

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
ds "high" to me for anything >> but a few dimensions. What's the dimensionality of the input without >> dimension reduction? >> >> Something is amiss in this pipeline. It is an interesting question! >> >> On Tue, Jun 14, 2011 a

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
ou see quite high average >>> similarity with no dimension reduction! >>> >>> An average cosine similarity of 0.87 sounds "high" to me for anything >>> but a few dimensions. What's the dimensionality of the input without >>> dimension red

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
says: "distributed implementation of cosine similarity that does not center its data" So... this seems to be the similarity and not the distance? Cheers, Stefan 2011/6/14 Stefan Wienert : > but... why do I get the different results with cosine similarity with > no dimension re

Re: tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
gt; 2011/6/14 Jake Mannix > >> actually, wait - are your graphs showing *similarity*, or *distance*?  In >> higher >> dimensions, *distance* (and cosine angle) should grow, but on the other >> hand, >> *similarity* (1-cos(angle)) should go toward 0. >> >> O

tf-idf + svd + cosine similarity

2011-06-14 Thread Stefan Wienert
Hey Guys, I have some strange results in my LSA-Pipeline. First, I explain the steps my data is making: 1) Extract Term-Dokument-Matrix from a Lucene datastore using TFIDF as weighter 2) Transposing TDM 3a) Using Mahout SVD (Lanczos) with the transposed TDM 3b) Using Mahout SSVD (stochastic SVD)

Re: Need a little help with SVD / Dimensional Reduction

2011-06-07 Thread Stefan Wienert
27;modify' your progam, especially if you need > to show practical result within a week. but you might help me by > creating a version of your program that takes it and we can see > together what it takes to get it working for you. > > -d > > On Tue, Jun 7, 2011 at 1:01 PM, Stef

Re: Need a little help with SVD / Dimensional Reduction

2011-06-07 Thread Stefan Wienert
Before I rewrite my program, is there any advantage over the lanczos svd? 2011/6/7 Dmitriy Lyubimov : > I am saying i did not test it with 0.20.2 > > Yes it is integrated in 0.5 release but there might be problems with > hadoop 0.20.2 > > On Tue, Jun 7, 2011 at 12:55 PM, Ste

Re: Need a little help with SVD / Dimensional Reduction

2011-06-07 Thread Stefan Wienert
it may require a number of > patches before it works for you. > > Here is (a little bit too wordy) command line manual for Mahout 0.5. > http://weatheringthrutechdays.blogspot.com/2011/03/ssvd-command-line-usage.html > > Thanks. > > -D > > > On Mon, Jun 6, 2

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
alternating least squares > which gives you two lower rank matrices which simulates the large decomposed > matrix? > > On Mon, Jun 6, 2011 at 1:30 PM, Stefan Wienert wrote: > >> Hi Danny! >> >> I understand that for M*M' (and for M'*M) the left and

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
wikipeida text: When *M* is also positive > semi-definite<http://en.wikipedia.org/wiki/Positive-definite_matrix>, > the decomposition *M* = *U**D**U* * is also a singular value decomposition. > So you don't need to be worried about the other singular vectors. > > Hope this helps!

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
on to eigenvalue decomposition. > > Hope this helps, > > Danny Bickson > > On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert wrote: > >> After reading this thread: >> >> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3caanlktinq5k4xrm7nab

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
on to eigenvalue decomposition. > > Hope this helps, > > Danny Bickson > > On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert wrote: > >> After reading this thread: >> >> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3caanlktinq5k4xrm7nab

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
)* (because this is, what the calculation in the example is)? Thanks Stefan 2011/6/6 Stefan Wienert : > https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction > > What is done: > > Input: > tf-idf-matrix (docs x terms) 6076937 x 20444 > > "SVD" of

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
ache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=rta7tfrm8zi60vcfya5xf+dbfrj8pcds2n...@mail.gmail.com%3E so my question: what is the output of the SVD in mahout. And what do I have to calculate to get the "right singular value" from svd? Thanks, Stefan 2011/6/6 Stefan Wienert : > h

Re: Need a little help with SVD / Dimensional Reduction

2011-06-06 Thread Stefan Wienert
: > Yes.  These are term vectors, not document vectors. > > There is an additional step that can be run to produce document vectors. > > On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert wrote: > >> compared to SVD, is the result is the "right singular value"?

Re: Reading from Lucene, writing to IntWritable VectorWritable does not work

2011-06-05 Thread Stefan Wienert
No worries, fixed the problem by using another reader :) thanks anyway :) 2011/6/1 Lance Norskog : > Attachments don't work. Maybe you could use one of those file snippet sites? > > On Tue, May 31, 2011 at 8:09 AM, Stefan Wienert wrote: >> Hi Guys, >> >> got a

Need a little help with SVD / Dimensional Reduction

2011-06-05 Thread Stefan Wienert
so, why do you calculate tfidf-vectors^T * svdOut^T? I do not find myself an explanation compared to SVD, is the result is the "right singular value"? I know it works, but I don't understand some of these steps. Please help... :) -- Stefan Wienert http://www.wienert.cc ste...@wie

Reading from Lucene, writing to IntWritable VectorWritable does not work

2011-05-31 Thread Stefan Wienert
Hi Guys, got a small problem here. Source Code (running) is added to this mail. So... I take a lucene index and want to save the data as a IntWritable-VectorWritable-SequenceFile. There seems to be no problem BUT after saving the data, I try to read them again: Class: class org.apache.mahout.mat

Re: How to convert SequenceFile to SequenceFile?

2011-05-25 Thread Stefan Wienert
/25 Jake Mannix : > Did you rebuild your tfidf-vectors with trunk as well? > > On Wed, May 25, 2011 at 6:59 AM, Stefan Wienert wrote: > >> First, I use http://svn.apache.org/repos/asf/mahout/trunk, tested some >> minutes ago with the newest version. >> >>

Re: How to convert SequenceFile to SequenceFile?

2011-05-25 Thread Stefan Wienert
to use LongWritable instead of IntWritable? Is this problematically? 2011/5/25 Jake Mannix : > On Wed, May 25, 2011 at 6:14 AM, Stefan Wienert wrote: > >> So the real problem is, that "transpose" and "matrixmult" (maybe) >> still uses IntWritable instead of Lo

Re: How to convert SequenceFile to SequenceFile?

2011-05-25 Thread Stefan Wienert
uot; to make sure it's > positive. > > You would need to store the reverse mapping from int to long to get > your original values out later. And there is a tiny chance of > collision. > > On Wed, May 25, 2011 at 12:59 PM, Stefan Wienert wrote: >> Hi, >> >>

How to convert SequenceFile to SequenceFile?

2011-05-25 Thread Stefan Wienert
Hi, I need some help using Hadoop : I'm trying to do some Dimensional reduction after this tutorial: https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction I created my tf-idf-vectors from text saved in lucene: https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors