Re: spark-itemsimilarity out of memory problem

Pat Ferrel Tue, 23 Dec 2014 09:29:27 -0800

There is a large-ish data structure in the Spark version of this algorithm. 
Each slave has a copy of several BiMaps that handle translation of your IDs 
into and out of Mahout IDs. One of these is created for user IDs, and one for 
each item ID set. For a single action that would be 2 BiMaps. These are 
broadcast values. So enough memory must be available for these. Their size 
depends on how many user and item IDs you have.
 
On Dec 23, 2014, at 8:05 AM, Ted Dunning <[email protected]> wrote:


On Tue, Dec 23, 2014 at 7:39 AM, AlShater, Hani <[email protected]> wrote:

> @Ted, It is 3 nodes small cluster for POC. Spark executer is given 2g and
> yarn is configured accordingly. I am trying to avoid spark memory caching.
> 

Have you tried the map-reduce version?

Re: spark-itemsimilarity out of memory problem

Reply via email to