I try to measure how spark standalone cluster performance scale out with 
multiple machines. I did a test of training the SVM model which is heavy in 
memory computation. I measure the run time for spark standalone cluster of 1 - 
3 nodes, the result is following

1 node: 35 minutes
2 nodes: 30.1 minutes
3 nodes: 30.8 minutes

So the speed does not seems to increase much with more machines. I know there 
are overhead for coordinating tasks among different machines. Seem to me the 
overhead is over 30% of the total run time.

Is this typical? Does anybody see significant performance increase with more 
machines? Is there anything I can tune my spark cluster to make it scale out 
with more machines?

Thanks
Ningjun

Reply via email to