Yes that's fine input then.

Large alpha should go with small R values, not large R. Really alpha
controls how much observed input (R != 0) is weighted towards 1 versus how
much unobserved input (R=0) is weighted to 0. I scale lambda by alpha to
complete this effect.


On Mon, Mar 18, 2013 at 1:06 PM, Han JU <ju.han.fe...@gmail.com> wrote:

> Thanks for quick responses.
>
> Yes it's that dataset. What I'm using is triplets of "user_id song_id
> play_times", of ~ 1m users. No audio things, just plein text triples.
>
> It seems to me that the paper about "implicit feedback" matchs well this
> dataset: no explicit ratings, but times of listening to a song.
>
> Thank you Sean for the alpha value, I think they use big numbers is because
> their values in the R matrix is big.
>
>
> 2013/3/18 Sebastian Schelter <ssc.o...@googlemail.com>
>
> > JU,
> >
> > are you refering to this dataset?
> >
> > http://labrosa.ee.columbia.edu/millionsong/tasteprofile
> >
> > On 18.03.2013 17:47, Sean Owen wrote:
> > > One word of caution, is that there are at least two papers on ALS and
> > they
> > > define lambda differently. I think you are talking about "Collaborative
> > > Filtering for Implicit Feedback Datasets".
> > >
> > > I've been working with some folks who point out that alpha=40 seems to
> be
> > > too high for most data sets. After running some tests on common data
> > sets,
> > > alpha=1 looks much better. YMMV.
> > >
> > > In the end you have to evaluate these two parameters, and the # of
> > > features, across a range to determine what's best.
> > >
> > > Is this data set not a bunch of audio features? I am not sure it works
> > for
> > > ALS, not naturally at least.
> > >
> > >
> > > On Mon, Mar 18, 2013 at 12:39 PM, Han JU <ju.han.fe...@gmail.com>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> I'm wondering has someone tried ParallelALS with implicite feedback
> job
> > on
> > >> million song dataset? Some pointers on alpha and lambda?
> > >>
> > >> In the paper alpha is 40 and lambda is 150, but I don't know what are
> > their
> > >> r values in the matrix. They said is based on time units that users
> have
> > >> watched the show, so may be it's big.
> > >>
> > >> Many thanks!
> > >> --
> > >> *JU Han*
> > >>
> > >> UTC   -  Université de Technologie de Compiègne
> > >> *     **GI06 - Fouille de Données et Décisionnel*
> > >>
> > >> +33 0619608888
> > >>
> > >
> >
> >
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Reply via email to