Re: Parallel item-based recommender job

2013-03-25 Thread Sebastian Schelter
Hi JU, are you sure regarding 1. ? It would be a bug. How do you exactly call the job? 2. The threshold is used during the similarity computation and is a lower bound for the similarities considered. For certain measures (like Pearson or Cosine) it also allows to prune some item pairs early. You

Re: sql data model w/where clause

2013-03-25 Thread Sean Owen
Modify the existing code to change the SQL -- it's just a matter of copying a class that only specifies SQL and making new SQL statements. I think there's a version that even reads from a Properties object. On Mon, Mar 25, 2013 at 12:11 AM, Matt Mitchell goodie...@gmail.com wrote: Hi, I have a

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
Points from across several e-mails -- The initial item-feature matrix can be just random unit vectors too. I have slightly better results with that. You are finding the least-squares solution of A = U M' for U given A and M. Yes you can derive that analytically as the zero of the derivative of

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Ted Dunning
Even more in-line. On Mon, Mar 25, 2013 at 11:46 AM, Sean Owen sro...@gmail.com wrote: Points from across several e-mails -- The initial item-feature matrix can be just random unit vectors too. I have slightly better results with that. You are finding the least-squares solution of A = U M'

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Ted Dunning
Well, actually, you can. LSI does exactly that. What the effect is of doing this is not clear to me. Do you know what happens if you assume missing values are 0? On Mon, Mar 25, 2013 at 12:10 PM, Sebastian Schelter s...@apache.org wrote: I think one crucial point is missing from this

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
OK, the 'k iterations' happen inline in one job? I thought the Lanczos algorithm found the k eigenvalues/vectors one after the other. Yeah I suppose that doesn't literally mean k map/reduce jobs. Yes the broader idea was whether or not you might get something useful out of ALS earlier. On Mon,

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sebastian Schelter
Well in LSI it is ok to do that, as a missing entry means that the document contains zero occurrences of a given term which is totally fine. In Collaborative Filtering with explicit feedback, a missing rating is not automatically a rating of zero, it is simply unknown what the user would give as

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
On Mon, Mar 25, 2013 at 11:25 AM, Sebastian Schelter s...@apache.org wrote: Well in LSI it is ok to do that, as a missing entry means that the document contains zero occurrences of a given term which is totally fine. In Collaborative Filtering with explicit feedback, a missing rating is not

Creating dataset from Lucene Index

2013-03-25 Thread Fabrizio Macedonio
Hi all, is possible create mahout dataset from Lucene Index? How can i create a dataset from my docs file (doc, docx, pdf)? Thanks, Fabrizio

Re: postgres recommendation adapter

2013-03-25 Thread Sean Owen
Are you using the 'integration' artifact? this is not in 'core'. On Mon, Mar 25, 2013 at 12:43 PM, Matt Mitchell goodie...@gmail.com wrote: Yeah sorry. I'm attempting to load this class: org.apache.mahout.cf.taste.impl.model.jdbc.PostgreSQLBooleanPrefJDBCDataModel but getting a

How Clustering Algorithm creates cluster names

2013-03-25 Thread VIGNESH S
Hi, As far as i understand,except some exceptions most clustering algorithms will give the cluster label by itself can some explain me how on what basis clustering label is created in any clustering algorithm. For example,In algorithms like K-means eventhough we mention the number of clusters

Re: postgres recommendation adapter

2013-03-25 Thread Matt Mitchell
Oye I feel dumb now. Thanks, once again Sean. Moving on :) - Matt On Mon, Mar 25, 2013 at 9:01 AM, Sean Owen sro...@gmail.com wrote: Are you using the 'integration' artifact? this is not in 'core'. On Mon, Mar 25, 2013 at 12:43 PM, Matt Mitchell goodie...@gmail.com wrote: Yeah sorry. I'm

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Koobas
On Mon, Mar 25, 2013 at 7:48 AM, Sean Owen sro...@gmail.com wrote: On Mon, Mar 25, 2013 at 11:25 AM, Sebastian Schelter s...@apache.org wrote: Well in LSI it is ok to do that, as a missing entry means that the document contains zero occurrences of a given term which is totally fine. In

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
(The unobserved entries are still in the loss function, just with low weight. They are also in the system of equations you are solving for.) On Mon, Mar 25, 2013 at 1:38 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Classic als wr is bypassing underlearning problem by cutting out unrated

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Dmitriy Lyubimov
On Mar 25, 2013 6:38 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: On Mar 25, 2013 4:15 AM, Ted Dunning ted.dunn...@gmail.com wrote: Well, actually, you can. LSI does exactly that. What the effect is of doing this is not clear to me. Do you know what happens if you assume missing

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Dmitriy Lyubimov
On Mar 25, 2013 6:44 AM, Sean Owen sro...@gmail.com wrote: (The unobserved entries are still in the loss function, just with low weight. They are also in the system of equations you are solving for.) Not in the classic alswr paper i was specifically referring to. It actually uses minors of

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
On Mon, Mar 25, 2013 at 1:41 PM, Koobas koo...@gmail.com wrote: But the assumption works nicely for click-like data. Better still when you can weakly prefer to reconstruct the 0 for missing observations and much more strongly prefer to reconstruct the 1 for observed data. This does seem

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sebastian Schelter
As clarification, here are the relevant papers. The approach for explicit feedback [1] does not use unobserved cells, only the approch for handling implicit feedback [2] does, but weighs them down. /s [1] Large-scale Parallel Collaborative Filtering for the Netflix Prize

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Koobas
On Mon, Mar 25, 2013 at 9:52 AM, Sean Owen sro...@gmail.com wrote: On Mon, Mar 25, 2013 at 1:41 PM, Koobas koo...@gmail.com wrote: But the assumption works nicely for click-like data. Better still when you can weakly prefer to reconstruct the 0 for missing observations and much more

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Sean Owen
If your input is clicks, carts, etc. yes you ought to get generally good results from something meant to consume implicit feedback, like ALS (for implicit feedback, yes there are at least two main variants). I think you are talking about the implicit version since you mention 0/1. lambda is the

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Koobas
On Mon, Mar 25, 2013 at 10:43 AM, Sean Owen sro...@gmail.com wrote: If your input is clicks, carts, etc. yes you ought to get generally good results from something meant to consume implicit feedback, like ALS (for implicit feedback, yes there are at least two main variants). I think you are

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Ted Dunning
Yes. But SSVD != Lanczos. Lanczos is vector at at time sequential like you said. SSVD does all the vectors in one go. That one go requires a few steps, but does not require O(k) iterations. On Mon, Mar 25, 2013 at 12:16 PM, Sean Owen sro...@gmail.com wrote: OK, the 'k iterations' happen

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Abhijith CHandraprabhu
Sorry, I actually meant svds(sparse SVD). I think in mahout they use Lanczos also. On Mon, Mar 25, 2013 at 4:25 PM, Ted Dunning ted.dunn...@gmail.com wrote: Yes. But SSVD != Lanczos. Lanczos is vector at at time sequential like you said. SSVD does all the vectors in one go. That one go

Re: Mathematical background of ALS recommenders

2013-03-25 Thread Ted Dunning
No. We don't. We used to use Lanczos, but that has improved. On Mon, Mar 25, 2013 at 4:43 PM, Abhijith CHandraprabhu abhiji...@gmail.com wrote: Sorry, I actually meant svds(sparse SVD). I think in mahout they use Lanczos also. On Mon, Mar 25, 2013 at 4:25 PM, Ted Dunning

SequenceFileVectorWriter key class

2013-03-25 Thread Ryan Josal
Hi all, In looking for a solution to the type mismatch between the output of lucene.vector and the input of cvb lda, I found org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter in the mahout integration source assumes the SequenceFile.Writer object it takes as a constructor parameter's