Xiangrui and Debasish,

(2014/06/18 6:33), Debasish Das wrote:
I did run pretty big sparse dataset (20M rows, 3M sparse features) and I
got 100 iterations of SGD running in 200 seconds...10 executors each
with 16 GB memory...

I could figure out what the problem is. "spark.akka.frameSize" was too large. By setting spark.akka.frameSize=10, it worked for the news20 dataset.

The execution was slow for more large KDD cup 2012, Track 2 dataset (235M+ records of 16.7M+ (2^24) sparse features in about 33.6GB) due to the sequential aggregation of dense vectors on a single driver node.

It took about 7.6m for aggregation for an iteration.

Thanks,
Makoto

Reply via email to