How many partitions now? Btw, which Spark version are you using? I checked your code and I don't understand why you want to broadcast vectors2, which is an RDD.
var vectors2 = vectors.repartition(1000).persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER) var broadcastVector = sc.broadcast(vectors2) What is the total memory of your cluster? Does the dataset fit into memory? If not, you can try turning on `spark.rdd.compress`. The whole dataset is not small. -Xiangrui On Mon, Aug 25, 2014 at 11:46 PM, durin <m...@simon-schaefer.net> wrote: > With a lower number of partitions, I keep losing executors during > collect at KMeans.scala:283 > The error message is "ExecutorLostFailure (executor lost)". > The program recovers by automatically repartitioning the whole dataset > (126G), which takes very long and seems to only delay the inevitable > failure. > > Is there a recommended solution to this issue? > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Only-master-is-really-busy-at-KMeans-training-tp12411p12803.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org