I am running a job where I am consistently causing the spark driver to crash, and am unable to diagnose the cause. I am running on Databricks, but I am posting my question here in case there may be something that I am doing which is clearly a problematic operation in spark.
I am trying to do a machine learning-type task: 1) I simulate some data as an RDD of mllib LabeledPoints (8 partitions, 10k points per partition, each has 40 dimensions). 2) I train a model on each partition, using a random forest library that I wrote myself. I use mapPartitions, and end up with an RDD of RandomForest, one per partition 3) I try to force computation of the RDD of RandomForests by calling count(). I also call checkpoint() 4) I generate a local test set on the master node of size 1000 5) I do batch fitting of test points. This procedure is a bit complicated, but, for each batch (of size 100 points), each partition must calculate 100 different 40x40 matrices. These are then communicated back to the master. I do this by calling broadcast on the batch, and applying a map function to the RDD of forests that were trained in step 3. This creates a RDD of Array[(Array(Double), Array(Double))], which I then collect(). It should total only a few megabytes. >From here things go wrong. The first batch goes slowly, taking approximately 1 minute. Then the following batches are faster; about 10 seconds each. Then the spark driver crashes after about the 4th or 5th batch. I have two questions: 1) Why might the first batch be particularly slow, given that I have already forced computation of the RDD that it depends on, and there is no difference, in principle between any of the batches? 2) What might be causing the spark driver to crash? The code runs fine when I am running in local mode. Thanks in advance, I can provide more details if necessary -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-determine-cause-of-spark-driver-crash-tp24917.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org