Hi, I am training a RandomForest Regression Model on Spark-1.6.1 (EMR) and am interested in how it might be best to scale it - e.g more cpus per instances, more memory per instance, more instances etc.
I'm currently using 32 m3.xlarge instances for for a training set with 2.5 million rows, 1300 columns and a total size of 31GB (parquet) thanks -- Franc