Yes that's fine input then. Large alpha should go with small R values, not large R. Really alpha controls how much observed input (R != 0) is weighted towards 1 versus how much unobserved input (R=0) is weighted to 0. I scale lambda by alpha to complete this effect.
On Mon, Mar 18, 2013 at 1:06 PM, Han JU <ju.han.fe...@gmail.com> wrote: > Thanks for quick responses. > > Yes it's that dataset. What I'm using is triplets of "user_id song_id > play_times", of ~ 1m users. No audio things, just plein text triples. > > It seems to me that the paper about "implicit feedback" matchs well this > dataset: no explicit ratings, but times of listening to a song. > > Thank you Sean for the alpha value, I think they use big numbers is because > their values in the R matrix is big. > > > 2013/3/18 Sebastian Schelter <ssc.o...@googlemail.com> > > > JU, > > > > are you refering to this dataset? > > > > http://labrosa.ee.columbia.edu/millionsong/tasteprofile > > > > On 18.03.2013 17:47, Sean Owen wrote: > > > One word of caution, is that there are at least two papers on ALS and > > they > > > define lambda differently. I think you are talking about "Collaborative > > > Filtering for Implicit Feedback Datasets". > > > > > > I've been working with some folks who point out that alpha=40 seems to > be > > > too high for most data sets. After running some tests on common data > > sets, > > > alpha=1 looks much better. YMMV. > > > > > > In the end you have to evaluate these two parameters, and the # of > > > features, across a range to determine what's best. > > > > > > Is this data set not a bunch of audio features? I am not sure it works > > for > > > ALS, not naturally at least. > > > > > > > > > On Mon, Mar 18, 2013 at 12:39 PM, Han JU <ju.han.fe...@gmail.com> > wrote: > > > > > >> Hi, > > >> > > >> I'm wondering has someone tried ParallelALS with implicite feedback > job > > on > > >> million song dataset? Some pointers on alpha and lambda? > > >> > > >> In the paper alpha is 40 and lambda is 150, but I don't know what are > > their > > >> r values in the matrix. They said is based on time units that users > have > > >> watched the show, so may be it's big. > > >> > > >> Many thanks! > > >> -- > > >> *JU Han* > > >> > > >> UTC - Université de Technologie de Compiègne > > >> * **GI06 - Fouille de Données et Décisionnel* > > >> > > >> +33 0619608888 > > >> > > > > > > > > > > -- > *JU Han* > > Software Engineer Intern @ KXEN Inc. > UTC - Université de Technologie de Compiègne > * **GI06 - Fouille de Données et Décisionnel* > > +33 0619608888 >