Any algorithm is non-deterministic because of non-deterministic behavior of
underlying hardware, of course :) But that's an offtop. I'm talking about
specific implementation of specific algorithm, and in general I'd like to know
that at least some very general properties of the algorithm implementation
conserve (and why did authors added intentional non-deterministic component to
implementation).
> Date: Mon, 24 Jun 2013 14:43:59 -0700
> Subject: Re: Consistent repeatable results for distributed ALS-WR recommender
> From: dlie...@gmail.com
> To: user@mahout.apache.org
>
> The point of non-determinism of parallel processing is well known. It was a
> joke to remind to be careful with absolute statements like "never exists",
> as they are very hard to prove. Bringing more positive examples still does
> not prove an absolute statement made, or make it any stronger from the math
> logic point of view. Whereas there's enough just one counter-example to
> disprove them. :)
>
>
> On Mon, Jun 24, 2013 at 2:29 PM, Koobas <koo...@gmail.com> wrote:
>
> > On Mon, Jun 24, 2013 at 5:07 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> > > On Mon, Jun 24, 2013 at 1:35 PM, Michael Kazekin <kazm...@hotmail.com
> > > >wrote:
> > >
> > > > I agree with you, I should have mentioned earlier that it would be good
> > > to
> > > > separate "noise from data" and deal with only what is separable. Of
> > > course
> > > > there is no truly deterministic implementation of any algorithm,
> > >
> > >
> > > I am pretty sure "2.0 + 2.0" is pretty deterministic :)
> > >
> > >
> > Few things are naturally deterministic in parallel computing.
> > Many parallel sorting algorithms are non-deterministic.
> >
> > In floating point commutativity is gone.
> > So, while 2.0 + 2.0 is deterministic, 1.0 + 10.0 + 100.0 is not 1.0 + 100.0
> > + 10.0.
> > Again, you don't know what happens if the reduction is done in parallel.
> >
> >
> >
> > > > but I would expect to see "credible" results on a macro-level (in our
> > > case
> > > > it would be nice to see the same order of recommendations given the
> > fixed
> > > > seed). It seems important for experiments (and for testing, as
> > > mentioned),
> > > > isn't it?
> > > >
> > >
> > > Yes for unit tests you usually would want to fix the seed if it means
> > that
> > > assertion may fail with a non-zero probability. There are definitely a
> > lot
> > > of such cases in Mahout.
> > >
> > > Another question is that afaik ALS-WR is deterministic by its inception,
> > so
> > > > I'm trying to understand the reasons (and I'm assuming there are some)
> > > for
> > > > the specific implementation design.
> > > >
> > > > Thanks for a free lunch! ;)
> > > > Cheers,Mike.
> > > >
> > > > > Date: Mon, 24 Jun 2013 13:13:20 -0700
> > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR
> > > > recommender
> > > > > From: dlie...@gmail.com
> > > > > To: user@mahout.apache.org
> > > > >
> > > > > On Mon, Jun 24, 2013 at 1:07 PM, Michael Kazekin <
> > kazm...@hotmail.com
> > > > >wrote:
> > > > >
> > > > > > Thank you, Ted!
> > > > > > Any feedback on the usefulness of such functionality? Could it
> > > increase
> > > > > > the 'playability' of the recommender?
> > > > > >
> > > > >
> > > > > Almost all methods -- even deterministic ones -- will have a
> > "credible
> > > > > interval" of prediction simply because method assumptions do not hold
> > > > 100%
> > > > > in real life, real data. So what you really want to know in such
> > cases
> > > is
> > > > > the credible interval rather than whether method is deterministic or
> > > not.
> > > > > Non-deterministic methods might very well be more accurate than
> > > > > deterministic ones in this context, and, therefore, more "useful".
> > Also
> > > > > see: "no free lunch theorem".
> > > > >
> > > > >
> > > > > > > From: ted.dunn...@gmail.com
> > > > > > > Date: Mon, 24 Jun 2013 20:46:43 +0100
> > > > > > > Subject: Re: Consistent repeatable results for distributed ALS-WR
> > > > > > recommender
> > > > > > > To: user@mahout.apache.org
> > > > > > >
> > > > > > > See org.apache.mahout.common.RandomUtils#useTestSeed
> > > > > > >
> > > > > > > It provides the ability to freeze the initial seed. Normally
> > this
> > > is
> > > > > > only
> > > > > > > used during testing, but you could use it.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 24, 2013 at 8:44 PM, Michael Kazekin <
> > > > kazm...@hotmail.com
> > > > > > >wrote:
> > > > > > >
> > > > > > > > Thanks a lot!
> > > > > > > > Do you know by any chance what are the underlying reasons for
> > > > including
> > > > > > > > such mandatory random seed initialization?
> > > > > > > > Do you see any sense in providing another option, such as
> > filling
> > > > them
> > > > > > > > with zeroes in order to ensure the consistency and
> > repeatability?
> > > > (for
> > > > > > > > example we might want to track and compare the generated
> > > > recommendation
> > > > > > > > lists for different parameters, such as the number of features
> > or
> > > > > > number of
> > > > > > > > iterations etc.)
> > > > > > > > M.
> > > > > > > >
> > > > > > > >
> > > > > > > > > Date: Mon, 24 Jun 2013 19:51:44 +0200
> > > > > > > > > Subject: Re: Consistent repeatable results for distributed
> > > ALS-WR
> > > > > > > > recommender
> > > > > > > > > From: s...@apache.org
> > > > > > > > > To: user@mahout.apache.org
> > > > > > > > >
> > > > > > > > > The matrices of the factorization are initalized randomly. If
> > > you
> > > > > > fix the
> > > > > > > > > random seed (would require modification of the code) you
> > should
> > > > get
> > > > > > > > exactly
> > > > > > > > > the same results.
> > > > > > > > > Am 24.06.2013 13:49 schrieb "Michael Kazekin" <
> > > > kazm...@hotmail.com>:
> > > > > > > > >
> > > > > > > > > > Hi!
> > > > > > > > > > Should I assume that under same dataset and same parameters
> > > for
> > > > > > > > factorizer
> > > > > > > > > > and recommender I will get the same results for any
> > specific
> > > > user?
> > > > > > > > > > My current understanding that theoretically ALS-WR
> > algorithm
> > > > could
> > > > > > > > > > guarantee this, but I was wondering could be there any
> > > numeric
> > > > > > method
> > > > > > > > > > issues and/or implementation-specific concerns.
> > > > > > > > > > Would appreciate any highlight on this issue.
> > > > > > > > > > Mike.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> >