Hi

I have spent quite some time trying to debug an issue with the Random Forest
algorithm on Spark 2.0.2. The input dataset is relatively large at around
600k rows and 200MB, but I use subsampling to make each tree manageable.
However even with only 1 tree and a low sample rate of 0.05 the job hangs at
one of the final stages (see attached). I have checked the logs on all
executors and the driver and find no traces of error. Could it be a memory
issue even though no error appears? The error does seem sporadic to some
extent so I also wondered whether it could be a data issue, that only occurs
if the subsample includes the bad data rows. 

Please comment if you have a clue.

Morten

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n28192/Sk%C3%A6rmbillede_2016-12-10_kl.png>
 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Random-Forest-hangs-without-trace-of-error-tp28192.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to