On Fri, Jun 29, 2012 at 1:13 AM, Nowal, Akshay
akshay_no...@syntelinc.comwrote:
I am at a beginner level in using Mahout and m planning to build a
classifier on Customer data to classify churners and non-churners using
support vector machine(SVM).
The easiest way to do this is to add a
Hi all,
I'm trying to implement Latent Semantic Indexing using the mahout ssvd
tool, and I'm having trouble understanding how I can use the output of ssvd
Mahout to 'fold' new queries (documents) into the LSI space. Specifically,
I can't find a way to multiply a vector representing a query by the
Well the inverse of a diagonal matrix like that is just going to be a
diagonal matrix holding the reciprocals (1/x) of the values. That much
is easy. But you need to invert more than that to fold in.
I admit even I don't know the details of the Mahout implementation
you're using, but I imagine
Thanks for the quick response. So I will create a new diagonal matrix with
the reciprocals of the eigenvalues, and multiply by that. I took a look at
the slides (very nice presentation!), but it seems that I won't even need
to go this far, as I should be able to take E^(-1) x U^(T) x docvector,
Yes. the fold-in formula is given in the link you mentioned , formulas
(2) and (3), of which you probably need only one depending from which
way you are going. Usually you are folding in new documents (rows of
U), so you need formula (2) to add new folded-in rows.
Also as comment implies, your
Yes the two are saying the same thing in different ways.
What you really need, to project a new column of A into V, is the
(pseudo-)inverse (U * sigma)^-1. This would be sigma^-1 * U^-1. Here
U^-1 = UT because the SVD gives you orthonormal bases in U and V --
that's a nice property of what the
PS of course folding in a considerable amount of new data is not
recommended since when you fold-in, you are not learning any new
semantic space. you are only able to project new documents into
previously learned sematic space and keep measuring similarities to
them in that space.
(which
Thanks very much for the clarification and advice! I'm working with the
wikipedia dataset, so I'm using a somewhat 'static' space, and the intent
of the queries is to use the context of a spotted surface form to select
the most similar resource (Wikipedia page) from a set of possible
Hello, I'm Pricila Rodrigues. I'm from Brazil.
I am developing a project of Data Ming involve the use of Hadoop and Mahout.
I'm following the example of running k-means is that:
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data#FootnoteMarker1
When I type the
What is ur input directory ?
Are u running on HDFS or local machine?
On Jun 29, 2012, at 5:02 PM, pricila rr pricila...@gmail.com wrote:
Hello, I'm Pricila Rodrigues. I'm from Brazil.
I am developing a project of Data Ming involve the use of Hadoop and Mahout.
I'm following the example
10 matches
Mail list logo