Hi I have a Spark job which hangs for around 7 hours or more than that until jobs killed out by Autosys because of time out. Data is not huge I am sure it stucks because of GC but I cant find source code which causes GC I am reusing almost all variable trying to minimize creating local objects though I cant avoid creating many String objects in order to update DataFrame values. When I see live thread debug in the executor where job is running I see attached running/waiting threads. Please guide me to find which waiting thread is culprit and preventing my job to finish. My code uses dataframe.group by one around 8 fields and also uses coalese(1) twice so it shuffles huge amounts of data in terms of GBs in each executor when I see in the UI.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg> Here is the heap space error which is I dont understand how to resolve in my code <http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-cause-waiting-threads-etc-of-hanging-job-for-7-hours-tp25850.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org