Dear Pat,
Thank you very much for the detailed feedback! Indeed, the '1-' was just
a typo...
The runtime that I get as output, when running - does this actually
include the setup? or is this computation time only?
Great thanks, Hartwig
On 2/11/15 10:55 AM, Pat Ferrel wrote:
1) This is a Hadoop mapreduce job so the speed is related to how many nodes you
have in the cluster—increase them.
2) Runtime is also dependent on the size of your data. How many users and items?
You set "numIterations 1-“ is that a typo?
If #1 and #2 do not explain runtime try starting from the default values for
all options and changing one at a time to see what is affecting runtime.
On Feb 11, 2015, at 7:36 AM, Hartwig Anzt <[email protected]> wrote:
Dear Mahout-Users,
I would like to use the ALS implementation available in Mahout as reference in
a performance evaluation. The challenge for me, as I have little knowledge
about the Mahout implementation, is to ensure that the exact same setup is
running.
I want to obtain timings for the alternating least square iteration, using a
defined test matrix and a dimension of 'f' for the small matrix used in the
minimization process - I think it is called 'feature space dimension' in
literature.
Assuming I want to run 10 iterations, use a feature space dimension of 50 and 8
threads, is the following command correct, or does this include more than the
ALS algorithm?
mahout parallelALS --input test.data --output output --lambda 0.1
--implicitFeedback true --alpha 0.8 --numFeatures 50 --numIterations 1-
--numThreadsPerSolver 8 --tempDir tmp
I am asking because the runtime seems to be quite large. In case the timing
includes operations other then the ALS, is there a way to exclude them?
I appreciate any feedback! Thanks in advance, Hartwig