Hi,
I am running LogisticRegressionWithSGD in spark 1.4.1 and it always takes
100 iterations to train (which is the default). It never meets the
convergence criteria, shouldn't the convergence criteria for SGD be based
on difference in logloss or the difference in accuracy on a held out test
set ?
Yes, we are close to having more 2 billion users. In this case what is the
best way to handle this.
Thanks,
Nishanth
On Fri, Jan 9, 2015 at 9:50 PM, Xiangrui Meng wrote:
> Do you have more than 2 billion users/products? If not, you can pair
> each user/product id with an integer (check RDD.zipW
types. This would be the
best solution, but it requires some refactoring of the code.
Best,
Xiangrui
On Wed, Jan 14, 2015 at 1:04 PM, Nishanth P S wrote:
> Yes, we are close to having more 2 billion users. In this case what is the
> best way to handle this.
>
> Thanks,
> Nishanth