Re: Support Vector Machine in Mahout

2012-06-29 Thread Ted Dunning
On Fri, Jun 29, 2012 at 1:13 AM, Nowal, Akshay akshay_no...@syntelinc.comwrote: I am at a beginner level in using Mahout and m planning to build a classifier on Customer data to classify churners and non-churners using support vector machine(SVM). The easiest way to do this is to add a

LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Chris Hokamp
Hi all, I'm trying to implement Latent Semantic Indexing using the mahout ssvd tool, and I'm having trouble understanding how I can use the output of ssvd Mahout to 'fold' new queries (documents) into the LSI space. Specifically, I can't find a way to multiply a vector representing a query by the

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Sean Owen
Well the inverse of a diagonal matrix like that is just going to be a diagonal matrix holding the reciprocals (1/x) of the values. That much is easy. But you need to invert more than that to fold in. I admit even I don't know the details of the Mahout implementation you're using, but I imagine

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Chris Hokamp
Thanks for the quick response. So I will create a new diagonal matrix with the reciprocals of the eigenvalues, and multiply by that. I took a look at the slides (very nice presentation!), but it seems that I won't even need to go this far, as I should be able to take E^(-1) x U^(T) x docvector,

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Dmitriy Lyubimov
Yes. the fold-in formula is given in the link you mentioned , formulas (2) and (3), of which you probably need only one depending from which way you are going. Usually you are folding in new documents (rows of U), so you need formula (2) to add new folded-in rows. Also as comment implies, your

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Sean Owen
Yes the two are saying the same thing in different ways. What you really need, to project a new column of A into V, is the (pseudo-)inverse (U * sigma)^-1. This would be sigma^-1 * U^-1. Here U^-1 = UT because the SVD gives you orthonormal bases in U and V -- that's a nice property of what the

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Dmitriy Lyubimov
PS of course folding in a considerable amount of new data is not recommended since when you fold-in, you are not learning any new semantic space. you are only able to project new documents into previously learned sematic space and keep measuring similarities to them in that space. (which

Re: LSI using Mahout ssvd - folding a new doc into the space

2012-06-29 Thread Chris Hokamp
Thanks very much for the clarification and advice! I'm working with the wikipedia dataset, so I'm using a somewhat 'static' space, and the intent of the queries is to use the context of a spotted surface form to select the most similar resource (Wikipedia page) from a set of possible

MAHOUT: error CardinalityException

2012-06-29 Thread pricila rr
Hello, I'm Pricila Rodrigues. I'm from Brazil. I am developing a project of Data Ming involve the use of Hadoop and Mahout. I'm following the example of running k-means is that: https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data#FootnoteMarker1 When I type the

Re: MAHOUT: error CardinalityException

2012-06-29 Thread Aniruddha Basak
What is ur input directory ? Are u running on HDFS or local machine? On Jun 29, 2012, at 5:02 PM, pricila rr pricila...@gmail.com wrote: Hello, I'm Pricila Rodrigues. I'm from Brazil. I am developing a project of Data Ming involve the use of Hadoop and Mahout. I'm following the example