Hi Sebastian,
I've tried the svn trunk. Hadoop constantly complains about memory like
out of memory error.
On the datanode there's 4 physic cores and by hyper-threading it has 16
logical cores, so I set --numThreadsPerSolver to 16 and that seems to have
a problem with memory.
How you set your
Hi JU,
the job creates an OpenIntObjectHashMapVector holding the feature
vectors as DenseVectors. In one map-job, it is filled with the
user-feature vectors, in the next one with the item feature vectors.
I used 4 gigabytes for a dataset with 1.8M users (using 20 features),
so I guess that
I concur with everything that you state. In ideal world, we would have a
framework that offers a well implemented hybrid hash-join [1] that takes
advantage of all available memory and gracefully uses the disk once the
amount of memory is not enough, such as the one used by Stratosphere [2].
Best,
Thanks again Sebastian and Seon, I set -Xmx4000m for mapred.child.java.opts
and 8 threads for each mapper. Now the job runs smoothly and the whole
factorization ends in 45min. With your settings I think it should be even
faster.
One more thing is that the RecommendJob is kind of slow (for all
Hi JU,
I reworked the RecommenderJob in a similar way as the ALS job. Can you
give it a try?
You have to try the patch from
https://issues.apache.org/jira/browse/MAHOUT-1169
In introduces a new param to RecommenderJob called --numThreads. The
configuration of the job should be done similar to