Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600 mb), it takes more than 25 minutes to finish. The spark version we are using is 1.0 and we were running this program on a 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. The 5 million records only have two distinct records (One positive and one negative), others are all duplications. Any one has any idea on why it takes so long on this small data? Thanks, Best, Peng