Hi guys, I am new to Spark. When I run Spark Kmeans (org.apache.spark.mllib.clustering.KMeans) on a small dataset, it works great. However, when using a large dataset with 1.5 million vectors, it just hangs there at some reducyByKey/collectAsMap stages (attached image shows the corresponding UI).
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n16413/spark.png> In the log file, I can see the errors below: 14/10/14 13:04:30 ERROR ConnectionManager: Corresponding SendingConnectionManagerId not found 14/10/14 13:04:30 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(server_name_here,32936) 14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(server_name_here,32936) 14/10/14 13:04:30 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@4aeed0e6 java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87) at java.nio.channels.SelectionKey.isConnectable(SelectionKey.java:336) at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:352) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116) 14/10/14 13:04:30 ERROR SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(server_name_here,32936) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:397) at org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:176) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/10/14 13:04:30 INFO ConnectionManager: Key not valid ? sun.nio.ch.SelectionKeyImpl@2d584a4e 14/10/14 13:04:30 INFO ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(server_name_here,37767) 14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(server_name_here,37767) 14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(server_name_here,37767) 14/10/14 13:04:30 INFO ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@2d584a4e java.nio.channels.CancelledKeyException at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116) 14/10/14 13:04:30 INFO ConnectionManager: Handling connection error on connection to ConnectionManagerId(server_name_here,32936) 14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(server_name_here,32936) 14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to ConnectionManagerId(server_name_here,32936) 14/10/14 13:04:30 ERROR CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@server_name_here:44765] -> [akka.tcp://spark@server_name_here:46406] disassociated! Shutting down. Regarding the above errors, I searched online and tried increasing the following confs, but still did not work. spark.worker.timeout=30000 spark.akka.timeout=30000 spark.akka.retry.wait=30000 spark.akka.frameSize=10000 spark.storage.blockManagerHeartBeatMs=30000 --driver-memory "2g" --executor-memory "2g" --num-executors 100 I am running spark-submit on YARN. The Spark version is 1.1.0, and Hadoop is 2.4.1. Could you please some comments/insights? Thanks a lot. Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org