Hi All,

It would be good to get some tips on tuning Apache Spark for Random
Forest classification.
Currently, we have a model that looks like:

featureSubsetStrategy all
impurity gini
maxBins 32
maxDepth 11
numberOfClasses 2
numberOfTrees 100

We are running Spark 1.5.1 as a standalone cluster.

1 Master and 2 Worker nodes.
The amount of RAM is 32GB on each node with 4 Cores.
The classification takes 440ms.

When we increase the number of trees to 500, it takes 8 sec already.
We tried to reduce the depth but then error rate is higher. We have
around 246 attributes.

Probably we are doing something wrong. Any ideas how we could improve
the performance ?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Tips-for-Spark-s-Random-Forest-slow-performance-tp25766.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to