Re: Vector truncation for visualization

2011-06-08 Thread Ted Dunning
On Thu, Jun 9, 2011 at 2:27 AM, Lance Norskog wrote: > Projecting to the first "two" singular vectors? Yes. > Do an SVD on a random matrix, and use the first 2 (or three) singular > vectors as a matrix? Not a random matrix. A matrix of positions shifted back to have average mean (aka PCA). >

Re: Confused on binary vs source distributions

2011-06-08 Thread Mark
Just tested seq2sparse using binary distribution again and received: 11/06/08 21:17:00 INFO mapred.JobClient: Task Id : attempt_201106061352_0066_r_01_1, Status : FAILED Error: java.lang.ClassNotFoundException: org.apache.lucene.analysis.TokenStream at java.net.URLClassLoader$1.run(URLCl

Re: Vector truncation for visualization

2011-06-08 Thread Lance Norskog
Projecting to the first "two" singular vectors? Do an SVD on a random matrix, and use the first 2 (or three) singular vectors as a matrix? What goes into the affinity matrix? On Wed, Jun 8, 2011 at 4:24 PM, Ted Dunning wrote: > Projecting to the first to singular vectors is better. > > Forming a

Confused on binary vs source distributions

2011-06-08 Thread Mark
I explained in an earlier post that I was having problems running some examples on a cluster when using the binary distribution. My cluster was complaining about missing classes.. ie lucene analyzer and google preconditions. However when I tried the same thing on a src distribution (and after m

Problems running distributed seq2sparse

2011-06-08 Thread Mark
Hello all, I am trying to run seq2sparse as follow: bin/mahout seq2sparse \ -i clustering/items-seq \ -o clustering/items-vectors \ -wt tfidf \ -nr 3 \ -ng 3 \ -s 5 \ -md 3 \ -x 90 \ -ml 50 \ -ow The first tas

Re: Vector truncation for visualization

2011-06-08 Thread Ted Dunning
Projecting to the first to singular vectors is better. Forming an affinity (rather than distance) matrix and projecting to those coordinations is very interesting. On Thu, Jun 9, 2011 at 12:25 AM, Lance Norskog wrote: > I've used multi-dimensional scaling (MDS) in another toolkit to > down-proje

Re: Problems running examples

2011-06-08 Thread Hector Yee
I got a slightly different error on the next line of KMeansDriver.java (running on OS X Snow Leopard) 11/06/08 16:02:12 INFO compress.CodecPool: Got brand-new compressor Exception in thread "main" java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.mahout.ma

Vector truncation for visualization

2011-06-08 Thread Lance Norskog
I've used multi-dimensional scaling (MDS) in another toolkit to down-project high-dim vectors to 2d and 3d. What tools for this are available in Mahout? Random Projection down to 2 dimensions is easy, but seems unsound. -- Lance Norskog goks...@gmail.com

Re: Computing SVD Of "Large Sparse Data"

2011-06-08 Thread Ted Dunning
I would encourage you to take a stab at a patch on this. You aren't the only person to have expressed interest in scaling PCA, but you aren't a member of a large horde, either. On Wed, Jun 8, 2011 at 7:39 AM, Eshwaran Vijaya Kumar wrote: > Thanks Ted. That is good news. > On Jun 7, 2011, at 11:1

Re: Need a little help with SVD / Dimensional Reduction

2011-06-08 Thread Dmitriy Lyubimov
i guess the only problem that creates such demand for CDH is the fact that hadoop project twisted everybody's arm by deprecating the entire MR api over what seems to be just perceived OOA design issues but not functional issues. Even that would've been ok if it weren't for the fact that they did no

Re: how to use bayse classifier to predict

2011-06-08 Thread Grant Ingersoll
Have a look at the Classify class in the classifier package as a starting place. -Grant On Jun 8, 2011, at 4:32 AM, 刘逸哲 wrote: > Hi all, > There are trainclassifier and testclassifier, but I would like to > know how to make prediction on new text with any lable. > I think the te

Re: Computing SVD Of "Large Sparse Data"

2011-06-08 Thread Eshwaran Vijaya Kumar
Thanks Ted. That is good news. On Jun 7, 2011, at 11:12 PM, Ted Dunning wrote: > I think that incorporating mean subtraction into the SSVD code should > be relatively straightforward. The trick is that you have to project > the orginal matrix and the mean separately and then combine the > results

how to use bayse classifier to predict

2011-06-08 Thread 刘逸哲
Hi all, There are trainclassifier and testclassifier, but I would like to know how to make prediction on new text with any lable. I think the testclassifier must make a prediction at first, but the testclassifier interface need input texts with lables? Is there an easy

Re: Need a little help with SVD / Dimensional Reduction

2011-06-08 Thread Sean Owen
Hadoop is Hadoop, so I don't know that any roadmap inconsistency between CDH and Hadoop is somehow Hadoop's fault. I don't think it's this ambiguous. Mahout runs on 0.20.2 Amazon EMR runs 0.20.2. The latest Hadoop version is 0.20.203.0. CDH is indeed somewhere inbetween but that's CDH. On Wed, Ju