Spark KMeans hangs at reduceByKey / collectAsMap

Ray Tue, 14 Oct 2014 13:23:46 -0700

Hi guys,

I am new to Spark. When I run Spark Kmeans
(org.apache.spark.mllib.clustering.KMeans) on a small dataset, it works
great. However, when using a large dataset with 1.5 million vectors, it just
hangs there at some reducyByKey/collectAsMap stages (attached image shows
the corresponding UI).


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n16413/spark.png> 



In the log file, I can see the errors below:

14/10/14 13:04:30 ERROR ConnectionManager: Corresponding
SendingConnectionManagerId not found
14/10/14 13:04:30 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(server_name_here,32936)
14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(server_name_here,32936)
14/10/14 13:04:30 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@4aeed0e6
java.nio.channels.CancelledKeyException
        at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
        at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:87)
        at
java.nio.channels.SelectionKey.isConnectable(SelectionKey.java:336)
        at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:352)
        at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
14/10/14 13:04:30 ERROR SendingConnection: Exception while reading
SendingConnection to ConnectionManagerId(server_name_here,32936)
java.nio.channels.ClosedChannelException
        at
sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
        at
org.apache.spark.network.SendingConnection.read(Connection.scala:397)
        at
org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:176)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/10/14 13:04:30 INFO ConnectionManager: Key not valid ?
sun.nio.ch.SelectionKeyImpl@2d584a4e
14/10/14 13:04:30 INFO ConnectionManager: Removing ReceivingConnection to
ConnectionManagerId(server_name_here,37767)
14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(server_name_here,37767)
14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(server_name_here,37767)
14/10/14 13:04:30 INFO ConnectionManager: key already cancelled ?
sun.nio.ch.SelectionKeyImpl@2d584a4e
java.nio.channels.CancelledKeyException
        at
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:363)
        at
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:116)
14/10/14 13:04:30 INFO ConnectionManager: Handling connection error on
connection to ConnectionManagerId(server_name_here,32936)
14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(server_name_here,32936)
14/10/14 13:04:30 INFO ConnectionManager: Removing SendingConnection to
ConnectionManagerId(server_name_here,32936)
14/10/14 13:04:30 ERROR CoarseGrainedExecutorBackend: Driver Disassociated
[akka.tcp://sparkExecutor@server_name_here:44765] ->
[akka.tcp://spark@server_name_here:46406] disassociated! Shutting down.




Regarding the above errors, I searched online and tried increasing the
following confs, but still did not work.

spark.worker.timeout=30000 

spark.akka.timeout=30000 
spark.akka.retry.wait=30000 
spark.akka.frameSize=10000

spark.storage.blockManagerHeartBeatMs=30000  

--driver-memory "2g"
--executor-memory "2g"
--num-executors 100



I am running spark-submit on YARN. The Spark version is 1.1.0,  and Hadoop
is 2.4.1.

Could you please some comments/insights?

Thanks a lot.

Ray




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark KMeans hangs at reduceByKey / collectAsMap

Reply via email to