Spark 2.0.0-preview We've got an app that uses a fairly big broadcast variable. We run this on a big EC2 instance, so deployment is in client-mode. Broadcasted variable is a massive Map[String, Array[String]].
At the end of saveAsTextFile, the output in the folder seems to be complete and correct (apart from .crc files still being there) BUT the spark-submit process is stuck on, seemingly, removing the broadcast variable. The stuck logs look like this: http://pastebin.com/wpTqvArY My last run lasted for 12 hours after after doing saveAsTextFile - just sitting there. I did a jstack on driver process, most threads are parked: http://pastebin.com/E29JKVT7 Full store: We used this code with Spark 1.5.0 and it worked, but then the data changed and something stopped fitting into Kryo's serialisation buffer. Increasing it didn't help, so I had to disable the KryoSerialiser. Tested it again - it hanged. Switched to 2.0.0-preview - seems like the same issue. I'm not quite sure what's even going on given that there's almost no CPU activity and no output in the logs, yet the output is not finalised like it used to before. Would appreciate any help, thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-at-Removed-broadcast-tp27320.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org