After thinking about it more, I do think weighting lambda by sum_i cij is the equivalent of the ALS-WR paper's approach for the implicit case. This provides scale-invariance for varying products/users and for varying ratings, and should behave well for all alphas. What do you guys think?
On Wed, May 6, 2015 at 12:29 PM, Ravi Mody <rmody...@gmail.com> wrote: > Whoops I just saw this thread, it got caught in my spam filter. Thanks for > looking into this Xiangrui and Sean. > > The implicit situation does seem fairly complicated to me. The cost > function (not including the regularization term) is affected both by the > number of ratings and by the number of user/products. As we increase alpha > the contribution to the cost function from the number of users/products > diminishes compared to the contribution from the number of ratings. So > large alphas seem to favor the weighted-lambda approach, even though it's > not a perfect match. Smaller alphas favor Xiangrui's 1.3.0 approach, but > again it's not a perfect match. > > I believe low alphas won't work well with regularization because both > terms in the cost function will just push everything to zero. Some of my > experiments confirm this. This leads me to think that weighted-lambda would > work better in practice, but I have no evidence of this. It may make sense > to weight lambda by sum_i cij instead? > > > > > > On Wed, Apr 1, 2015 at 7:59 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642 >> and used the same lambda scaling as in 1.2. The change will be >> included in Spark 1.3.1, which will be released soon. Thanks for >> reporting this issue! -Xiangrui >> >> On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng <men...@gmail.com> wrote: >> > I created a JIRA for this: >> > https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have >> > a clear answer about how the scaling should be handled. Maybe the best >> > solution for now is to switch back to the 1.2 scaling. -Xiangrui >> > >> > On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen <so...@cloudera.com> wrote: >> >> Ah yeah I take your point. The squared error term is over the whole >> >> user-item matrix, technically, in the implicit case. I suppose I am >> >> used to assuming that the 0 terms in this matrix are weighted so much >> >> less (because alpha is usually large-ish) that they're almost not >> >> there, but they are. So I had just used the explicit formulation. >> >> >> >> I suppose the result is kind of scale invariant, but not exactly. I >> >> had not prioritized this property since I had generally built models >> >> on the full data set and not a sample, and had assumed that lambda >> >> would need to be retuned over time as the input grew anyway. >> >> >> >> So, basically I don't know anything more than you do, sorry! >> >> >> >> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng <men...@gmail.com> >> wrote: >> >>> Hey Sean, >> >>> >> >>> That is true for explicit model, but not for implicit. The ALS-WR >> >>> paper doesn't cover the implicit model. In implicit formulation, a >> >>> sub-problem (for v_j) is: >> >>> >> >>> min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2 >> >>> >> >>> This is a sum for all i but not just the users who rate item j. In >> >>> this case, if we set X=m_j, the number of observed ratings for item j, >> >>> it is not really scale invariant. We have #users user vectors in the >> >>> least squares problem but only penalize lambda * #ratings. I was >> >>> suggesting using lambda * m directly for implicit model to match the >> >>> number of vectors in the least squares problem. Well, this is my >> >>> theory. I don't find any public work about it. >> >>> >> >>> Best, >> >>> Xiangrui >> >>> >> >>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen <so...@cloudera.com> >> wrote: >> >>>> I had always understood the formulation to be the first option you >> >>>> describe. Lambda is scaled by the number of items the user has rated >> / >> >>>> interacted with. I think the goal is to avoid fitting the tastes of >> >>>> prolific users disproportionately just because they have many ratings >> >>>> to fit. This is what's described in the ALS-WR paper we link to on >> the >> >>>> Spark web site, in equation 5 >> >>>> ( >> http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf >> ) >> >>>> >> >>>> I think this also gets you the scale-invariance? For every additional >> >>>> rating from user i to product j, you add one new term to the >> >>>> squared-error sum, (r_ij - u_i . m_j)^2, but also, you'd increase the >> >>>> regularization term by lambda * (|u_i|^2 + |m_j|^2) They are at >> least >> >>>> both increasing about linearly as ratings increase. If the >> >>>> regularization term is multiplied by the total number of users and >> >>>> products in the model, then it's fixed. >> >>>> >> >>>> I might misunderstand you and/or be speaking about something slightly >> >>>> different when it comes to invariance. But FWIW I had always >> >>>> understood the regularization to be multiplied by the number of >> >>>> explicit ratings. >> >>>> >> >>>> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng <men...@gmail.com> >> wrote: >> >>>>> Okay, I didn't realize that I changed the behavior of lambda in 1.3. >> >>>>> to make it "scale-invariant", but it is worth discussing whether >> this >> >>>>> is a good change. In 1.2, we multiply lambda by the number ratings >> in >> >>>>> each sub-problem. This makes it "scale-invariant" for explicit >> >>>>> feedback. However, in implicit feedback model, a user's sub-problem >> >>>>> contains all item factors. Then the question is whether we should >> >>>>> multiply lambda by the number of explicit ratings from this user or >> by >> >>>>> the total number of items. We used the former in 1.2 but changed to >> >>>>> the latter in 1.3. So you should try a smaller lambda to get a >> similar >> >>>>> result in 1.3. >> >>>>> >> >>>>> Sean and Shuo, which approach do you prefer? Do you know any >> existing >> >>>>> work discussing this? >> >>>>> >> >>>>> Best, >> >>>>> Xiangrui >> >>>>> >> >>>>> >> >>>>> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng <men...@gmail.com> >> wrote: >> >>>>>> This sounds like a bug ... Did you try a different lambda? It >> would be >> >>>>>> great if you can share your dataset or re-produce this issue on the >> >>>>>> public dataset. Thanks! -Xiangrui >> >>>>>> >> >>>>>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody <rmody...@gmail.com> >> wrote: >> >>>>>>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning >> vastly >> >>>>>>> smaller factors (and hence scores). For example, the first few >> product's >> >>>>>>> factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In >> 1.3.0, the >> >>>>>>> first few factor values are (2.535456E-8, 1.690301E-8, >> 6.99245E-8). This >> >>>>>>> difference of several orders of magnitude is consistent >> throughout both user >> >>>>>>> and product. The recommendations from 1.2.0 are subjectively much >> better >> >>>>>>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and >> uses less >> >>>>>>> memory. >> >>>>>>> >> >>>>>>> My first thought is that there is too much regularization in the >> 1.3.0 >> >>>>>>> results, but I'm using the same lambda parameter value. This is a >> snippet of >> >>>>>>> my scala code: >> >>>>>>> ..... >> >>>>>>> val rank = 75 >> >>>>>>> val numIterations = 15 >> >>>>>>> val alpha = 10 >> >>>>>>> val lambda = 0.01 >> >>>>>>> val model = ALS.trainImplicit(train_data, rank, numIterations, >> >>>>>>> lambda=lambda, alpha=alpha) >> >>>>>>> ..... >> >>>>>>> >> >>>>>>> The code and input data are identical across both versions. Did >> anything >> >>>>>>> change between the two versions I'm not aware of? I'd appreciate >> any help! >> >>>>>>> >> > >