Re: ALS-WR on Million Song dataset

2013-03-20 Thread Han JU
Hi Sebastian, I've tried the svn trunk. Hadoop constantly complains about memory like out of memory error. On the datanode there's 4 physic cores and by hyper-threading it has 16 logical cores, so I set --numThreadsPerSolver to 16 and that seems to have a problem with memory. How you set your

Re: ALS-WR on Million Song dataset

2013-03-20 Thread Sebastian Schelter
Hi JU, the job creates an OpenIntObjectHashMapVector holding the feature vectors as DenseVectors. In one map-job, it is filled with the user-feature vectors, in the next one with the item feature vectors. I used 4 gigabytes for a dataset with 1.8M users (using 20 features), so I guess that

Re: ALS-WR on Million Song dataset

2013-03-20 Thread Sebastian Schelter
I concur with everything that you state. In ideal world, we would have a framework that offers a well implemented hybrid hash-join [1] that takes advantage of all available memory and gracefully uses the disk once the amount of memory is not enough, such as the one used by Stratosphere [2]. Best,

Re: ALS-WR on Million Song dataset

2013-03-20 Thread Han JU
Thanks again Sebastian and Seon, I set -Xmx4000m for mapred.child.java.opts and 8 threads for each mapper. Now the job runs smoothly and the whole factorization ends in 45min. With your settings I think it should be even faster. One more thing is that the RecommendJob is kind of slow (for all

Re: ALS-WR on Million Song dataset

2013-03-20 Thread Sebastian Schelter
Hi JU, I reworked the RecommenderJob in a similar way as the ALS job. Can you give it a try? You have to try the patch from https://issues.apache.org/jira/browse/MAHOUT-1169 In introduces a new param to RecommenderJob called --numThreads. The configuration of the job should be done similar to