1) This is a Hadoop mapreduce job so the speed is related to how many nodes you 
have in the cluster—increase them.
2) Runtime is also dependent on the size of your data. How many users and items?

You set "numIterations 1-“ is that a typo?

If #1 and #2 do not explain runtime try starting from the default values for 
all options and changing one at a time to see what is affecting runtime. 

On Feb 11, 2015, at 7:36 AM, Hartwig Anzt <[email protected]> wrote:

Dear Mahout-Users,

I would like to use the ALS implementation available in Mahout as reference in 
a performance evaluation. The challenge for me, as I have little knowledge 
about the Mahout implementation, is to ensure that the exact same setup is 
running.

I want to obtain timings for the alternating least square iteration, using a 
defined test matrix and a dimension of 'f' for the small matrix used in the 
minimization process - I think it is called 'feature space dimension' in 
literature.

Assuming I want to run 10 iterations, use a feature space dimension of 50 and 8 
threads, is the following command correct, or does this include more than the 
ALS algorithm?

mahout parallelALS --input test.data --output output --lambda 0.1 
--implicitFeedback true --alpha 0.8 --numFeatures 50 --numIterations 1- 
--numThreadsPerSolver 8 --tempDir tmp

I am asking because the runtime seems to be quite large. In case the timing 
includes operations other then the ALS, is there a way to exclude them?

I appreciate any feedback! Thanks in advance, Hartwig

Reply via email to