Re: java.lang.NoClassDefFoundError: org/apache/commons/cli2/Option

2012-08-27 Thread Phoenix Bai
Or instead of invoking mahout in format $ hadoop jar mahout-core-0.5.jar , you should try $mahout ... in $MAHOUT_HOME/bin, there lies the mahout script which will load all necessary jar files before run any classes. the jars that required by mahout are normally put in $MAHOUT_HOME/lib e.g.

Re: More mappers in RowId

2012-08-27 Thread Anna Lahoud
Thank you Dan. I understand that having multiple rowid files for the next step(s) is the key. In my testing though, no matter how I slice it up, I am unable to get the rowsimilarity job to complete on my inputs. It easily completes the first phase of the job with my multiple (100) matrix files.

Re: does seq2sparse or kmeans filter data ? I am losing data!

2012-08-27 Thread Jeff Eastman
Offhand, I wonder why you are specifying only a single part-m-0 file in your clusterdump step? If there are more than one part file (a usual case) then you might be missing some of the clustered points. If so, then using the directory instead might help: --pointsDir

Re: Can someone suggest an approach for calculating precision and recall for distributed recommendations?

2012-08-27 Thread Ted Dunning
Obviously, you need to refer also to scores of other items as well. One handy stat is AUC whcih you can compute by averaging to get the probability that a relevant (viewed) item has a higher recommendation score than a non-relevant (not viewed) item. On Sun, Aug 26, 2012 at 5:55 PM, Sean Owen

Re: Visualization of word clusters

2012-08-27 Thread Ted Dunning
Here is some pretty old work that did the same sort of thing. The self organizing map (SOM) is an interesting alternative to MDS since it allows mapping a low dimensional approximate manifold to a linear space. The basic idea is that it preserves close distances and doesn't much care about

Re: does seq2sparse or kmeans filter data ? I am losing data!

2012-08-27 Thread Phoenix Bai
Hi Jeff, first of all, thank you for your response. But unfortunately, I don`t think that is the cause. as I checked, there is only one file part-m-0 under directory clusteredPoints. $ hadoop fs -ls /bmz/mahout/output/videotags-kmeans-clusters/clusteredPoints Found 1 items -rw-r- 3

Re: Can someone suggest an approach for calculating precision and recall for distributed recommendations?

2012-08-27 Thread Ted Dunning
In another forum, I responded to this question this way: One short answer is that you only need enough test data to drive the accuracy of your PR estimates to the point you need them. That isn't all that much data so the sequential version should do rather well. The gold standard, of course,