If you are performing recommendations via a latent factor model then I
highly recommend you look into methods of "approximate nearest neighbors".
 At Spotify we batch process top N recommendations for 40M users with a
catalog of > 40M items, but we avoid the naive O(n*m) process you are
describing by performing an approximate nearest neighbors search.  There
are a bunch of open source packages you can use including our own
https://github.com/spotify/annoy which uses random projections in your
latent factor space to build a forest of trees with constant time nearest
neighbors lookup.


On Fri, Jul 18, 2014 at 1:57 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> Agree GPUs may be interesting for this kind of massively parallel linear
> algebra on reasonable size vectors.
>
> These projects might be of interest in this regard:
> https://github.com/BIDData/BIDMach
> https://github.com/BIDData/BIDMat
> https://github.com/dlwh/gust
>
> Nick
>
>
>
> On Fri, Jul 18, 2014 at 7:40 PM, m3.sharma <sharm...@umn.edu> wrote:
>
>> Thanks Nick real-time suggestion is good, will see if we can add that to
>> our
>> deployment strategy and you are correct we may not need recommendation for
>> each user.
>>
>> Will try adding more resources and broadcasting item features suggestion
>> as
>> currently they don't seem to be huge.
>>
>> As users and items both will continue to grow in future for faster vector
>> computations I think few GPU nodes will suffice to serve faster
>> recommendation after learning model with SPARK. It will be great to have
>> builtin GPU support in SPARK for faster computations to leverage GPU
>> capability of nodes for performing these flops faster.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-ranked-recommendation-tp10098p10183.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>

Reply via email to