On Fri, Mar 11, 2016 at 12:18 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > In general, for serving situations MF models are stored in some other > serving system, so that system may be better suited to do the actual > fold-in. Sean's Oryx project does that, though I'm not sure offhand if that > part is done in Spark or not.
(No this part isn't Spark; it's just manipulating arrays in memory. Making the model is done in Spark, as is marshalling the input from a Kafka topic.) > I know Sean's old Myrrix project also used to support computing ALS with an > initial set of input factors, so you could in theory incrementally compute > on new data. I'm not sure if the newer Oryx project supports it though. (Yes, exactly the same thing exists in oryx) > @Sean, what are your thoughts on supporting an initial model (factors) in > ALS? I personally have always just recomputed the model, but for very large > scale stuff it can make a lot of sense obviously. What I'm not sure on is > whether it gives good solutions (relative to recomputing) - I'd imagine it > will tend to find a slightly better local minimum given a previous local > minimum starting point... with the advantage that new users / items are > incorporated. But of course users can do a full recompute periodically. I'd prefer to be able to specify a model, since typically the initial model takes 20-40 iterations to converge to a reasonable state, and only needs a few more to converge to the same threshold given a relatively small number of additional inputs. The difference can be a lot of compute time. This is one of the few things that got worse when I moved to Spark since this capability was lost. I had been too lazy to actually implement it though. But that'd be cool. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org