Hi I have spent quite some time trying to debug an issue with the Random Forest algorithm on Spark 2.0.2. The input dataset is relatively large at around 600k rows and 200MB, but I use subsampling to make each tree manageable. However even with only 1 tree and a low sample rate of 0.05 the job hangs at one of the final stages (see attached). I have checked the logs on all executors and the driver and find no traces of error. Could it be a memory issue even though no error appears? The error does seem sporadic to some extent so I also wondered whether it could be a data issue, that only occurs if the subsample includes the bad data rows.
Please comment if you have a clue. Morten <http://apache-spark-user-list.1001560.n3.nabble.com/file/n28192/Sk%C3%A6rmbillede_2016-12-10_kl.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Random-Forest-hangs-without-trace-of-error-tp28192.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org