Spark 2.0.0-preview

We've got an app that uses a fairly big broadcast variable. We run this on a
big EC2 instance, so deployment is in client-mode. Broadcasted variable is a
massive Map[String, Array[String]].

At the end of saveAsTextFile, the output in the folder seems to be complete
and correct (apart from .crc files still being there) BUT the spark-submit
process is stuck on, seemingly, removing the broadcast variable. The stuck
logs look like this: http://pastebin.com/wpTqvArY

My last run lasted for 12 hours after after doing saveAsTextFile - just
sitting there. I did a jstack on driver process, most threads are parked:
http://pastebin.com/E29JKVT7

Full store: We used this code with Spark 1.5.0 and it worked, but then the
data changed and something stopped fitting into Kryo's serialisation buffer.
Increasing it didn't help, so I had to disable the KryoSerialiser. Tested it
again - it hanged. Switched to 2.0.0-preview - seems like the same issue.

I'm not quite sure what's even going on given that there's almost no CPU
activity and no output in the logs, yet the output is not finalised like it
used to before.

Would appreciate any help, thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-at-Removed-broadcast-tp27320.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to