Thanks Nick, for your suggestions.

On Sun, Mar 15, 2015 at 10:41 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> As Sean says, precomputing recommendations is pretty inefficient. Though
> with 500k items its easy to get all the item vectors in memory so
> pre-computing is not too bad.
>
> Still, since you plan to serve these via a REST service anyway, computing
> on demand via a serving layer such as Oryx or PredictionIO (or the newly
> open sourced Seldon.io) is a good option. You can also cache the
> recommendations quite aggressively - once you compute a user or item top-K
> list, just stick the result in mem cache / redis / whatever and evict it
> when you recompute your offline model, or every hour or whatever.
>
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Sun, Mar 15, 2015 at 3:03 PM, Shashidhar Rao <
> raoshashidhar...@gmail.com> wrote:
>
>> Thanks Sean, your suggestions and the links provided are just what I
>> needed to start off with.
>>
>> On Sun, Mar 15, 2015 at 6:16 PM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> I think you're assuming that you will pre-compute recommendations and
>>> store them in Mongo. That's one way to go, with certain tradeoffs. You
>>> can precompute offline easily, and serve results at large scale
>>> easily, but, you are forced to precompute everything -- lots of wasted
>>> effort, not completely up to date.
>>>
>>> The front-end part of the stack looks right.
>>>
>>> Spark would do the model building; you'd have to write a process to
>>> score recommendations and store the result. Mahout is the same thing,
>>> really.
>>>
>>> 500K items isn't all that large. Your requirements aren't driven just
>>> by items though. Number of users and latent features matter too. It
>>> matters how often you want to build the model too. I'm guessing you
>>> would get away with a handful of modern machines for a problem this
>>> size.
>>>
>>>
>>> In a way what you describe reminds me of Wibidata, since it built
>>> recommender-like solutions on top of data and results published to a
>>> NoSQL store. You might glance at the related OSS project Kiji
>>> (http://kiji.org/) for ideas about how to manage the schema.
>>>
>>> You should have a look at things like Nick's architecture for
>>> Graphflow, however it's more concerned with computing recommendation
>>> on the fly, and describes a shift from an architecture originally
>>> built around something like a NoSQL store:
>>>
>>> http://spark-summit.org/wp-content/uploads/2014/07/Using-Spark-and-Shark-to-Power-a-Realt-time-Recommendation-and-Customer-Intelligence-Platform-Nick-Pentreath.pdf
>>>
>>> This is also the kind of ground the oryx project is intended to cover,
>>> something I've worked on personally:
>>> https://github.com/OryxProject/oryx   -- a layer on and around the
>>> core model building in Spark + Spark Streaming to provide a whole
>>> recommender (for example), down to the REST API.
>>>
>>> On Sun, Mar 15, 2015 at 10:45 AM, Shashidhar Rao
>>> <raoshashidhar...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > Can anyone who has developed recommendation engine suggest what could
>>> be the
>>> > possible software stack for such an application.
>>> >
>>> > I am basically new to recommendation engine , I just found out Mahout
>>> and
>>> > Spark Mlib which are available .
>>> > I am thinking the below software stack.
>>> >
>>> > 1. The user is going to use Android app.
>>> > 2.  Rest Api sent to app server from the android app to get
>>> recommendations.
>>> > 3. Spark Mlib core engine for recommendation engine
>>> > 4. MongoDB database backend.
>>> >
>>> > I would like to know more on the cluster configuration( how many nodes
>>> etc)
>>> > part of spark for calculating the recommendations for 500,000 items.
>>> This
>>> > items include products for day care etc.
>>> >
>>> > Other software stack suggestions would also be very useful.It has to
>>> run on
>>> > multiple vendor machines.
>>> >
>>> > Please suggest.
>>> >
>>> > Thanks
>>> > shashi
>>>
>>
>>
>

Reply via email to