Typically, they are all standalone...
2011/6/24 XiaoboGu :
> We will put a big SMP server to deploy Mahout.
>
> Regards,
>
> Xiaobo Gu
>
>
--
Stefan Wienert
http://www.wienert.cc
ste...@wienert.cc
Telefon: +495251-2026838
Mobil: +49176-40170270
ned by (1,-1,0) - they go from having similarity +1/sqrt(2)
> to similarity -1).
>
> I always interpret all similarities <= 0 as "maximally dissimilar",
> even if technically -1 is where this is exactly true.
>
> -jake
>
> On Wed, Jun 15, 2011 at 2:10 AM, Ste
>> symmetric case of U S U' (more generally, the Hermitian case, but we only
>> support real values).
>>
>> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov > >wrote:
>>
>> > I beg to differ... U and V are left and right eigenvectors, and
>&
e similarities may be different. I assume
> you used normalized (true) eigenvectors from ssvd.
>
> Also would be interesting to know what oversampling parameter you (p) you
> used.
>
> Thanks.
> -d
>
>
> On Tue, Jun 14, 2011 at 2:04 PM, Stefan Wienert wrote:
>> So.
one last question: for cosine similarity, sometimes the results are
negative (which means angel between vectors is greater than 90°). but
what does this means for the similarity?
Cheers,
Stefan
2011/6/14 Stefan Wienert :
> So... lets check the dimensions:
>
> First step: Lucene Outp
ds "high" to me for anything
>> but a few dimensions. What's the dimensionality of the input without
>> dimension reduction?
>>
>> Something is amiss in this pipeline. It is an interesting question!
>>
>> On Tue, Jun 14, 2011 a
ou see quite high average
>>> similarity with no dimension reduction!
>>>
>>> An average cosine similarity of 0.87 sounds "high" to me for anything
>>> but a few dimensions. What's the dimensionality of the input without
>>> dimension red
says: "distributed implementation of cosine similarity that
does not center its data"
So... this seems to be the similarity and not the distance?
Cheers,
Stefan
2011/6/14 Stefan Wienert :
> but... why do I get the different results with cosine similarity with
> no dimension re
gt; 2011/6/14 Jake Mannix
>
>> actually, wait - are your graphs showing *similarity*, or *distance*? In
>> higher
>> dimensions, *distance* (and cosine angle) should grow, but on the other
>> hand,
>> *similarity* (1-cos(angle)) should go toward 0.
>>
>> O
Hey Guys,
I have some strange results in my LSA-Pipeline.
First, I explain the steps my data is making:
1) Extract Term-Dokument-Matrix from a Lucene datastore using TFIDF as weighter
2) Transposing TDM
3a) Using Mahout SVD (Lanczos) with the transposed TDM
3b) Using Mahout SSVD (stochastic SVD)
27;modify' your progam, especially if you need
> to show practical result within a week. but you might help me by
> creating a version of your program that takes it and we can see
> together what it takes to get it working for you.
>
> -d
>
> On Tue, Jun 7, 2011 at 1:01 PM, Stef
Before I rewrite my program, is there any advantage over the lanczos svd?
2011/6/7 Dmitriy Lyubimov :
> I am saying i did not test it with 0.20.2
>
> Yes it is integrated in 0.5 release but there might be problems with
> hadoop 0.20.2
>
> On Tue, Jun 7, 2011 at 12:55 PM, Ste
it may require a number of
> patches before it works for you.
>
> Here is (a little bit too wordy) command line manual for Mahout 0.5.
> http://weatheringthrutechdays.blogspot.com/2011/03/ssvd-command-line-usage.html
>
> Thanks.
>
> -D
>
>
> On Mon, Jun 6, 2
alternating least squares
> which gives you two lower rank matrices which simulates the large decomposed
> matrix?
>
> On Mon, Jun 6, 2011 at 1:30 PM, Stefan Wienert wrote:
>
>> Hi Danny!
>>
>> I understand that for M*M' (and for M'*M) the left and
wikipeida text: When *M* is also positive
> semi-definite<http://en.wikipedia.org/wiki/Positive-definite_matrix>,
> the decomposition *M* = *U**D**U* * is also a singular value decomposition.
> So you don't need to be worried about the other singular vectors.
>
> Hope this helps!
on to eigenvalue decomposition.
>
> Hope this helps,
>
> Danny Bickson
>
> On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert wrote:
>
>> After reading this thread:
>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3caanlktinq5k4xrm7nab
on to eigenvalue decomposition.
>
> Hope this helps,
>
> Danny Bickson
>
> On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert wrote:
>
>> After reading this thread:
>>
>> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3caanlktinq5k4xrm7nab
)* (because this is, what the calculation in the example is)?
Thanks
Stefan
2011/6/6 Stefan Wienert :
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>
> What is done:
>
> Input:
> tf-idf-matrix (docs x terms) 6076937 x 20444
>
> "SVD" of
ache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=rta7tfrm8zi60vcfya5xf+dbfrj8pcds2n...@mail.gmail.com%3E
so my question: what is the output of the SVD in mahout. And what do I
have to calculate to get the "right singular value" from svd?
Thanks,
Stefan
2011/6/6 Stefan Wienert :
> h
:
> Yes. These are term vectors, not document vectors.
>
> There is an additional step that can be run to produce document vectors.
>
> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert wrote:
>
>> compared to SVD, is the result is the "right singular value"?
No worries, fixed the problem by using another reader :)
thanks anyway :)
2011/6/1 Lance Norskog :
> Attachments don't work. Maybe you could use one of those file snippet sites?
>
> On Tue, May 31, 2011 at 8:09 AM, Stefan Wienert wrote:
>> Hi Guys,
>>
>> got a
so, why do you calculate tfidf-vectors^T * svdOut^T? I do not find
myself an explanation
compared to SVD, is the result is the "right singular value"?
I know it works, but I don't understand some of these steps. Please help... :)
--
Stefan Wienert
http://www.wienert.cc
ste...@wie
Hi Guys,
got a small problem here. Source Code (running) is added to this mail.
So... I take a lucene index and want to save the data as a
IntWritable-VectorWritable-SequenceFile.
There seems to be no problem BUT after saving the data, I try to read
them again:
Class: class org.apache.mahout.mat
/25 Jake Mannix :
> Did you rebuild your tfidf-vectors with trunk as well?
>
> On Wed, May 25, 2011 at 6:59 AM, Stefan Wienert wrote:
>
>> First, I use http://svn.apache.org/repos/asf/mahout/trunk, tested some
>> minutes ago with the newest version.
>>
>>
to use
LongWritable instead of IntWritable? Is this problematically?
2011/5/25 Jake Mannix :
> On Wed, May 25, 2011 at 6:14 AM, Stefan Wienert wrote:
>
>> So the real problem is, that "transpose" and "matrixmult" (maybe)
>> still uses IntWritable instead of Lo
uot; to make sure it's
> positive.
>
> You would need to store the reverse mapping from int to long to get
> your original values out later. And there is a tiny chance of
> collision.
>
> On Wed, May 25, 2011 at 12:59 PM, Stefan Wienert wrote:
>> Hi,
>>
>>
Hi,
I need some help using Hadoop :
I'm trying to do some Dimensional reduction after this tutorial:
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
I created my tf-idf-vectors from text saved in lucene:
https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors
27 matches
Mail list logo