Re: reproducibility

2013-03-17 Thread Sean Owen
What's your question? ALS has a random starting point which changes the results a bit. Not sure about KNN though. On Sun, Mar 17, 2013 at 3:03 AM, Koobas koo...@gmail.com wrote: Can anybody shed any light on the issue of reproducibility in Mahout, with and without Hadoop, specifically in the

Re: reproducibility

2013-03-17 Thread Koobas
I am asking the basic reproducibility question. If I run twice on the same dataset, with the same hardware setup, will I always get the same resuts? Or is there any chance that on two different runs, the same user will get slightly different suggestions? I am mostly revolving in the space of

Re: reproducibility

2013-03-17 Thread Sean Owen
If an algorithm has a stochastic/random element, no it won't necessarily produce the same result, by design. If you can fix the seed of the random number generator, you should get the same result. Except that if the process is multi-threaded or distributed, even that doesn't guarantee it -- the

Re: reproducibility

2013-03-17 Thread Koobas
Understood. Thanks a lot. On Sun, Mar 17, 2013 at 9:57 AM, Sean Owen sro...@gmail.com wrote: If an algorithm has a stochastic/random element, no it won't necessarily produce the same result, by design. If you can fix the seed of the random number generator, you should get the same result.

Re: What will be the LDAPrintTopics compatible/equivalent feature in Mahout-0.7?

2013-03-17 Thread 万代豊
Jake Hi. Due to my housekeeping matters for other things, I have actually not built Mahout 0.7 from the trunk code yet, but before doing so,I have tried Mahout-0.6 so that I can run LDA straight forward. I have successfully ran LDA with input as TF vector file wuth 68 iteration across 43