Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-22 Thread Robin East
Hi There are many different variants of gradient descent mostly dealing with what the step size is and how it might be adjusted as the algorithm proceeds. Also if it uses a stochastic variant (as opposed to batch descent) then there are variations there too. I don’t know off-hand what MLlib’s

Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-21 Thread Robin East
I don’t get those results. I get: spark 0.14 scikit-learn0.85 The scikit-learn mse is due to the very low eta0 setting. Tweak that to 0.1 and push iterations to 400 and you get a mse ~= 0. Of course the coefficients are both ~1 and the intercept ~0. Similarly if you change the

Re: Is Apache Spark less accurate than Scikit Learn?

2015-01-21 Thread Jacques Heunis
Ah I see, thanks! I was just confused because given the same configuration, I would have thought that Spark and Scikit would give more similar results, but I guess this is simply not the case (as in your example, in order to get spark to give an mse sufficiently close to scikit's you have to give