so it looks like you're increasing num trees by 5x and you're seeing an 8x
increase in runtime, correct?

did you analyze the Spark cluster resources to monitor the memory usage,
spillage, disk I/O, etc?

you may need more Workers.

On Tue, Dec 22, 2015 at 8:57 AM, Alexander Ratnikov <
ratnikov.alexan...@gmail.com> wrote:

> Hi All,
>
> It would be good to get some tips on tuning Apache Spark for Random
> Forest classification.
> Currently, we have a model that looks like:
>
> featureSubsetStrategy all
> impurity gini
> maxBins 32
> maxDepth 11
> numberOfClasses 2
> numberOfTrees 100
>
> We are running Spark 1.5.1 as a standalone cluster.
>
> 1 Master and 2 Worker nodes.
> The amount of RAM is 32GB on each node with 4 Cores.
> The classification takes 440ms.
>
> When we increase the number of trees to 500, it takes 8 sec already.
> We tried to reduce the depth but then error rate is higher. We have
> around 246 attributes.
>
> Probably we are doing something wrong. Any ideas how we could improve
> the performance ?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Tips-for-Spark-s-Random-Forest-slow-performance-tp25766.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Reply via email to