Hi Xiangrui, I am using yarn-cluster mode. The current hadoop cluster is configured to only accept "yarn-cluster" mode and not allow "yarn-client" mode. I have no prevelige to change that.
Without initializing with "k-means||", the job finished in 10 minutes. With "k-means", it just hangs there for almost 1 hour. I guess I can only go with "random" initialization in KMeans. Thanks again for your help. Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16530.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org