Thanks Nick, for your suggestions. On Sun, Mar 15, 2015 at 10:41 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote:
> As Sean says, precomputing recommendations is pretty inefficient. Though > with 500k items its easy to get all the item vectors in memory so > pre-computing is not too bad. > > Still, since you plan to serve these via a REST service anyway, computing > on demand via a serving layer such as Oryx or PredictionIO (or the newly > open sourced Seldon.io) is a good option. You can also cache the > recommendations quite aggressively - once you compute a user or item top-K > list, just stick the result in mem cache / redis / whatever and evict it > when you recompute your offline model, or every hour or whatever. > > > — > Sent from Mailbox <https://www.dropbox.com/mailbox> > > > On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao < > raoshashidhar...@gmail.com> wrote: > >> Thanks Sean, your suggestions and the links provided are just what I >> needed to start off with. >> >> On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen <so...@cloudera.com> wrote: >> >>> I think you're assuming that you will pre-compute recommendations and >>> store them in Mongo. That's one way to go, with certain tradeoffs. You >>> can precompute offline easily, and serve results at large scale >>> easily, but, you are forced to precompute everything -- lots of wasted >>> effort, not completely up to date. >>> >>> The front-end part of the stack looks right. >>> >>> Spark would do the model building; you'd have to write a process to >>> score recommendations and store the result. Mahout is the same thing, >>> really. >>> >>> 500K items isn't all that large. Your requirements aren't driven just >>> by items though. Number of users and latent features matter too. It >>> matters how often you want to build the model too. I'm guessing you >>> would get away with a handful of modern machines for a problem this >>> size. >>> >>> >>> In a way what you describe reminds me of Wibidata, since it built >>> recommender-like solutions on top of data and results published to a >>> NoSQL store. You might glance at the related OSS project Kiji >>> (http://kiji.org/) for ideas about how to manage the schema. >>> >>> You should have a look at things like Nick's architecture for >>> Graphflow, however it's more concerned with computing recommendation >>> on the fly, and describes a shift from an architecture originally >>> built around something like a NoSQL store: >>> >>> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf >>> >>> This is also the kind of ground the oryx project is intended to cover, >>> something I've worked on personally: >>> https://github.com/OryxProject/oryx -- a layer on and around the >>> core model building in Spark + Spark Streaming to provide a whole >>> recommender (for example), down to the REST API. >>> >>> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao >>> <raoshashidhar...@gmail.com> wrote: >>> > Hi, >>> > >>> > Can anyone who has developed recommendation engine suggest what could >>> be the >>> > possible software stack for such an application. >>> > >>> > I am basically new to recommendation engine , I just found out Mahout >>> and >>> > Spark Mlib which are available . >>> > I am thinking the below software stack. >>> > >>> > 1. The user is going to use Android app. >>> > 2. Rest Api sent to app server from the android app to get >>> recommendations. >>> > 3. Spark Mlib core engine for recommendation engine >>> > 4. MongoDB database backend. >>> > >>> > I would like to know more on the cluster configuration( how many nodes >>> etc) >>> > part of spark for calculating the recommendations for 500,000 items. >>> This >>> > items include products for day care etc. >>> > >>> > Other software stack suggestions would also be very useful.It has to >>> run on >>> > multiple vendor machines. >>> > >>> > Please suggest. >>> > >>> > Thanks >>> > shashi >>> >> >> >