Or instead of invoking mahout in format $ hadoop jar mahout-core-0.5.jar ,
you should try $mahout ...
in $MAHOUT_HOME/bin, there lies the mahout script which will load all
necessary jar files before run any classes. the jars that required by
mahout are normally put in $MAHOUT_HOME/lib
e.g.
Thank you Dan. I understand that having multiple rowid files for the next
step(s) is the key. In my testing though, no matter how I slice it up, I am
unable to get the rowsimilarity job to complete on my inputs. It easily
completes the first phase of the job with my multiple (100) matrix files.
Offhand, I wonder why you are specifying only a single part-m-0 file
in your clusterdump step? If there are more than one part file (a usual
case) then you might be missing some of the clustered points. If so,
then using the directory instead might help:
--pointsDir
Obviously, you need to refer also to scores of other items as well.
One handy stat is AUC whcih you can compute by averaging to get the
probability that a relevant (viewed) item has a higher recommendation score
than a non-relevant (not viewed) item.
On Sun, Aug 26, 2012 at 5:55 PM, Sean Owen
Here is some pretty old work that did the same sort of thing. The self
organizing map (SOM) is an interesting alternative to MDS since it allows
mapping a low dimensional approximate manifold to a linear space. The
basic idea is that it preserves close distances and doesn't much care about
Hi Jeff,
first of all, thank you for your response.
But unfortunately, I don`t think that is the cause. as I checked, there is
only one file part-m-0 under directory clusteredPoints.
$ hadoop fs -ls /bmz/mahout/output/videotags-kmeans-clusters/clusteredPoints
Found 1 items
-rw-r- 3
In another forum, I responded to this question this way:
One short answer is that you only need enough test data to drive the
accuracy of your PR estimates to the point you need them. That isn't all
that much data so the sequential version should do rather well.
The gold standard, of course,