There are only 5 worker nodes. So please try to reduce the number of partitions to the number of available CPU cores. 1000 partitions are too bigger, because the driver needs to collect to task result from each partition. -Xiangrui
On Tue, Aug 19, 2014 at 1:41 PM, durin <m...@simon-schaefer.net> wrote: > When trying to use KMeans.train with some large data and 5 worker nodes, it > would due to BlockManagers shutting down because of timeout. I was able to > prevent that by adding > > spark.storage.blockManagerSlaveTimeoutMs 3000000 > > to the spark-defaults.conf. > > However, with 1 Million feature vectors, the Stage takeSample at > KMeans.scala:263 runs for about 50 minutes. In this time, about half of the > tasks are done, then I lose the executors and Spark starts a new > repartitioning stage. > > I also noticed that in the takeSample stage, the task was running for about > 2.5 minutes until suddenly it is finished and duration (prev. those 2.5min) > change to 2s, with 0.9s GC time. > > The training data is supplied in this form: > var vectors2 = > vectors.repartition(1000).persist(org.apache.spark.storage.StorageLevel.MEMORY_AND_DISK_SER) > var broadcastVector = sc.broadcast(vectors2) > > The 1000 partitions is something that could probably be optimized, but too > few will cause OOM erros. > > Using Ganglia, I can see that the master node is the only one that is > properly busy regarding CPU, and that most (600-700 of 800 total percent > CPU) is used by the master. > The workers on each node only use 1 Core, i.e. 100% CPU. > > > What would be the most likely cause for such an inefficient use of the > cluster, and how to prevent it? > Number of partitions, way of caching, ...? > > I'm trying to find out myself with tests, but ideas from someone with more > experience are very welcome. > > > Best regards, > simn > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Only-master-is-really-busy-at-KMeans-training-tp12411.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org