Running with lambda=0 fails the ALS code since the matrices no longer stays positive def and cholesky fails...
Run with a very low lambda (I tested with 1e-4) and you should see the decrease in RMSE as you expect... On Thu, Nov 27, 2014 at 3:04 AM, Kostas Kloudas <kklou...@gmail.com> wrote: > Thanks a lot for your time guys and your quick replies! > > > On Nov 26, 2014, at 7:53 PM, Xiangrui Meng <men...@gmail.com> wrote: > > > > The training RMSE may increase due to regularization. Squared loss > > only represents part of the global loss. If you watch the sum of the > > squared loss and the regularization, it should be non-increasing. > > -Xiangrui > > > > On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <so...@cloudera.com> wrote: > >> I also modified the example to try 1, 5, 9, ... iterations as you did, > >> and also ran with the same default parameters. I used the > >> sample_movielens_data.txt file. Is that what you're using? > >> > >> My result is: > >> > >> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = > 1.5013155094216357 > >> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = > 1.4847078708333596 > >> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = > 1.484713206769993 > >> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = > 1.4847132332994588 > >> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531 > >> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = > 1.4847132334567061 > >> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = > 1.4847132334567108 > >> > >> Train error is higher than test error, consistently, which could be > >> underfitting. A higher rank=50 gets a reasonable result: > >> > >> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = > 1.4841671360432005 > >> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = > 1.4672341345080382 > >> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = > 1.4672385714907996 > >> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = > 1.4672385929631868 > >> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = > 1.4672385930428344 > >> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = > 1.4672385930431973 > >> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = > 1.467238593043199 > >> > >> I'm not sure what the difference is. I looked at your modifications > >> and they seem very similar. Is it the data you're using? > >> > >> > >> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kklou...@gmail.com> > wrote: > >>> For the training I am using the code in the MovieLensALS example with > trainImplicit set to false > >>> and for the training RMSE I use the > >>> > >>> val rmseTr = computeRmse(model, training, params.implicitPrefs). > >>> > >>> The computeRmse() method is provided in the MovieLensALS class. > >>> > >>> > >>> Thanks a lot, > >>> Kostas > >>> > >>> > >>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <so...@cloudera.com> wrote: > >>>> > >>>> How are you computing RMSE? > >>>> and how are you training the model -- not with trainImplicit right? > >>>> I wonder if you are somehow optimizing something besides RMSE. > >>>> > >>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kklou...@gmail.com> > wrote: > >>>>> Once again, the error even with the training dataset increases. The > results > >>>>> are: > >>>>> > >>>>> Running 1 iterations > >>>>> For 1 iter.: Test RMSE = 1.2447121194304893 Training RMSE = > >>>>> 1.2394166987104076 (34.751317636 s). > >>>>> Running 5 iterations > >>>>> For 5 iter.: Test RMSE = 1.3253957117600659 Training RMSE = > >>>>> 1.3206317416138509 (37.693118023000004 s). > >>>>> Running 9 iterations > >>>>> For 9 iter.: Test RMSE = 1.3255293380139364 Training RMSE = > >>>>> 1.3207661218210436 (41.046175661 s). > >>>>> Running 13 iterations > >>>>> For 13 iter.: Test RMSE = 1.3255295352665748 Training RMSE = > >>>>> 1.3207663201865092 (47.763619515 s). > >>>>> Running 17 iterations > >>>>> For 17 iter.: Test RMSE = 1.32552953555787 Training RMSE = > >>>>> 1.3207663204794406 (59.682361103000005 s). > >>>>> Running 21 iterations > >>>>> For 21 iter.: Test RMSE = 1.3255295355583026 Training RMSE = > >>>>> 1.3207663204798756 (57.210578232 s). > >>>>> Running 25 iterations > >>>>> For 25 iter.: Test RMSE = 1.325529535558303 Training RMSE = > >>>>> 1.3207663204798765 (65.785485882 s). > >>>>> > >>>>> Thanks a lot, > >>>>> Kostas > >>>>> > >>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath < > nick.pentre...@gmail.com> > >>>>> wrote: > >>>>> > >>>>> copying user group - I keep replying directly vs reply all :) > >>>>> > >>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath < > nick.pentre...@gmail.com> > >>>>> wrote: > >>>>>> > >>>>>> ALS will be guaranteed to decrease the squared error (therefore > RMSE) in > >>>>>> each iteration, on the training set. > >>>>>> > >>>>>> This does not hold for the test set / cross validation. You would > expect > >>>>>> the test set RMSE to stabilise as iterations increase, since the > algorithm > >>>>>> converges - but not necessarily to decrease. > >>>>>> > >>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kklou...@gmail.com > > > >>>>>> wrote: > >>>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I am getting familiarized with Mllib and a thing I noticed is that > >>>>>>> running the MovieLensALS > >>>>>>> example on the movieLens dataset for increasing number of > iterations does > >>>>>>> not decrease the > >>>>>>> rmse. > >>>>>>> > >>>>>>> The results for 0.6% training set and 0.4% test are below. For > training > >>>>>>> set to 0.8%, the results > >>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing > error? > >>>>>>> Especially going from 1 to 5 iterations. > >>>>>>> > >>>>>>> Running 1 iterations > >>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s). > >>>>>>> Running 5 iterations > >>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s). > >>>>>>> Running 9 iterations > >>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s). > >>>>>>> Running 13 iterations > >>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s). > >>>>>>> Running 17 iterations > >>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s). > >>>>>>> Running 21 iterations > >>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s). > >>>>>>> Running 25 iterations > >>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s). > >>>>>>> Running 29 iterations > >>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s). > >>>>>>> > >>>>>>> Thanks a lot, > >>>>>>> Kostas > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>>>>>> For additional commands, e-mail: user-h...@spark.apache.org > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >