Re: news20-binary classification with LogisticRegressionWithSGD

Makoto Yui Thu, 19 Jun 2014 02:02:30 -0700

Xiangrui and Debasish,

(2014/06/18 6:33), Debasish Das wrote:

I did run pretty big sparse dataset (20M rows, 3M sparse features) and I
got 100 iterations of SGD running in 200 seconds...10 executors each
with 16 GB memory...

I could figure out what the problem is. "spark.akka.frameSize" was toolarge. By setting spark.akka.frameSize=10, it worked for the news20 dataset.

The execution was slow for more large KDD cup 2012, Track 2 dataset(235M+ records of 16.7M+ (2^24) sparse features in about 33.6GB) due tothe sequential aggregation of dense vectors on a single driver node.


It took about 7.6m for aggregation for an iteration.

Thanks,
Makoto

Re: news20-binary classification with LogisticRegressionWithSGD

Reply via email to