Re: Pseudo-Inverse map reduce implementation

2012-10-18 Thread Ted Dunning
Computing the svd with the stochastic projection is your best bet. Sent from my iPhone On Oct 17, 2012, at 10:42 PM, Ranjith Uthaman wrote: > Hi, > > Does map reduce implementation of Pseudo-Inverse of a matrix exist in the > current Mahout framework? What are the various ways to achieve it

Re: Pseudo-Inverse map reduce implementation

2012-10-18 Thread Sean Owen
I asked in reply on Quora -- what exactly are you computing? what is the size of input and are you talking about a generalized inverse. Depending on this there are easier ways than an SVD. On Thu, Oct 18, 2012 at 6:42 AM, Ranjith Uthaman wrote: > Hi, > > Does map reduce implementation of Pseudo-I

Re: Documentation for ParallelALSFactorizationJob

2012-10-18 Thread Sebastian Schelter
Hello Kris, We have an example of how to use ALS to factorize the movielens dataset, which also includes generating recommendations from the factorization: examples/bin/factorize-movielens-1M.sh We don't have automated code to find appropriate values for alpha and lambda unfortunately. There is

Re: Documentation for ParallelALSFactorizationJob

2012-10-18 Thread Sean Owen
alpha matters almost not at all by itself. It is used to create a weight in the cost function of 1 + alpha * rating. Except for the "1 +", it would not matter as it just scales all weights in the first half of the cost function proportionally. Where it does matter is in relation to lambda, because

Re: Documentation for ParallelALSFactorizationJob

2012-10-18 Thread Sebastian Schelter
> I don't know if there is code, > probably not, but conceptually that is all that it involves. Once you factorized your interaction matrix, you can use org.apache.mahout.cf.taste.hadoop.als.RecommenderJob to compute recommendations in parallel. Best, Sebastian

RE: Pseudo-Inverse map reduce implementation

2012-10-18 Thread Ranjith Uthaman
The final pursuit is building a content-based recommender of the item for each user. User-based and item-based recommenders of mahout as discussed in MahoutInAction book doesn't fare very well considering the data available. Also, a content-based recommender approach is also hinted in the book.

Re: Pseudo-Inverse map reduce implementation

2012-10-18 Thread Sean Owen
So you have a factorization like A = X * Y' and you are looking for the right inverse of Y' (where Y is the item-feature matrix)? This is just Y * pinv(Y' * Y). Y' * Y takes a little work to compute, but can be done in one pass over the matrix. Y' * Y is just a 1000x1000 matrix which you can inver

Re: mahout 0.5 to 0.7 commandline parameter of lda

2012-10-18 Thread Jake Mannix
For Mahout 0.7, the format of the model files for LDA are just a SequenceFile, with the row numbers being the topicIds, and the entries being the (un-normalized) probabilities for each termId. bin/vectordump --dictionary \ --dictioanryType \ --in

Re: mahout 0.5 to 0.7 commandline parameter of lda

2012-10-18 Thread Vineeth
I am running the lda for the first time. I gave the following command to test over the Reuters dataset but i got the error lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 7000 -x 20 -ow hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally SLF4J: C

Changing "mapred.reduce.tasks" makes various results on recommender

2012-10-18 Thread Y. Sakamoto
Hi, I'm using Hadoop-based and non-Hadoop-based recommender. When I change only hadoop parameter "mapred.reduce.tasks", the recommender result also changed. (I cannot gain same result.) Why is it happen? Thanks -- ///=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Y. Sakamoto

Re: Changing "mapred.reduce.tasks" makes various results on recommender

2012-10-18 Thread Sean Owen
Some algorithms use a random seed, which could give different answers on the same input. That's the first question. On Thu, Oct 18, 2012 at 6:21 PM, Y. Sakamoto wrote: > Hi, > > I'm using Hadoop-based and non-Hadoop-based recommender. > > When I change only hadoop parameter "mapred.reduce.tasks",

Re: mahout 0.5 to 0.7 commandline parameter of lda

2012-10-18 Thread Jake Mannix
On Thu, Oct 18, 2012 at 9:16 AM, Vineeth wrote: > I am running the lda for the first time. I gave the following command to > test over the Reuters dataset but i got the error > > lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 7000 -x > 20 -ow > > hadoop binary is not in PATH,HAD

Problems with testing Naive Bayes for small number of test cases in one category

2012-10-18 Thread Andrea Leistra
I'm working on a naive Bayes classifier in a case where a few categories are much less common than the rest. In the latest run of the process it happened that no instances of one of these ended up in the test set. As a result testnb failed with the following error (actual name of the label elide

Re: arrayindexoutofboundsexception

2012-10-18 Thread Amruta
Hi chiranjeevi, were you able to solve this issue? Thanks.

Jake Mannix Elected to Mahout PMC Chair

2012-10-18 Thread Jeff Eastman
Congratulations Jake! Jake Mannix has been recommended by the Mahout PMC to replace me as the new Mahout PMC Chairperson and the Apache Board has approved this recommendation in their meeting this week. Please join me in congratulating Jake on his new role in this exciting and interesting Apa

K-Means generates only one cluster

2012-10-18 Thread syed kather
Team Version Used : Mahout 0.6 Hadoop : 5 Nodes(1 Master + 4 Slaves) Once we had generated kmean clusters for 60 documents.I had run the clusterdump, which will extract the top terms from the cluster, There i had noticed only one clusters is made even though we had specified the n

Re: K-Means generates only one cluster

2012-10-18 Thread DAN HELM
We previously did some k-means clustering runs on different sized collections and noticed how that a large cluster was often created along with some smaller others. In digging deeper it turned out a lot of the document vectors (produced via the seq2sparse command) were null (empty).   k-means appa

Re: Jake Mannix Elected to Mahout PMC Chair

2012-10-18 Thread Jake Mannix
Thanks Jeff, I'm happy to take up the mantle, send our mapper legions into the vast uncharted lands of Big Data, reduce the unwashed noisy barbarians to clean and convergent nuggets of intelligence and models to be launched into more mappers! Or is this more like one of those "bully pulpit" pos