Re: ALS update without re-computing everything

Nick Pentreath Fri, 11 Mar 2016 04:18:57 -0800

Currently this is not supported. If you want to do incremental fold-in of
new data you would need to do it outside of Spark (e.g. see this
discussion:
https://mail-archives.apache.org/mod_mbox/spark-user/201603.mbox/browser,
which also mentions a streaming on-line MF implementation with SGD).

In general, for serving situations MF models are stored in some other
serving system, so that system may be better suited to do the actual
fold-in. Sean's Oryx project does that, though I'm not sure offhand if that
part is done in Spark or not.

I know Sean's old Myrrix project also used to support computing ALS with an
initial set of input factors, so you could in theory incrementally compute
on new data. I'm not sure if the newer Oryx project supports it though.

@Sean, what are your thoughts on supporting an initial model (factors) in
ALS? I personally have always just recomputed the model, but for very large
scale stuff it can make a lot of sense obviously. What I'm not sure on is
whether it gives good solutions (relative to recomputing) - I'd imagine it
will tend to find a slightly better local minimum given a previous local
minimum starting point... with the advantage that new users / items are
incorporated. But of course users can do a full recompute periodically.

On Fri, 11 Mar 2016 at 13:04 Roberto Pagliari <roberto.pagli...@asos.com>
wrote:

> In the current implementation of ALS with implicit feedback, when new date
> come in, it is not possible to update user/product matrices without
> re-computing everything.
>
> Is this feature in planning or any known work around?
>
> Thank you,
>
>

Re: ALS update without re-computing everything

Reply via email to