How to find cause(waiting threads etc) of hanging job for 7 hours?

unk1102 Fri, 01 Jan 2016 13:57:07 -0800

Hi I have a Spark job which hangs for around 7 hours or more than that until
jobs killed out by Autosys because of time out. Data is not huge I am sure
it stucks because of GC but I cant find source code which causes GC I am
reusing almost all variable trying to minimize creating local objects though
I cant avoid creating many String objects in order to update DataFrame
values. When I see live thread debug in the executor where job is running I
see attached running/waiting threads. Please guide me to find which waiting
thread is culprit and preventing my job to finish. My code uses
dataframe.group by one around 8 fields and also uses coalese(1) twice so it
shuffles huge amounts of data in terms of GBs in each executor when I see in
the UI.


<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg>
 
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg>
 

Here is the heap space error which is I dont understand how to resolve in my
code 

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n25850/Screen_Shot_2016-01-02_at_2.jpg>
 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-find-cause-waiting-threads-etc-of-hanging-job-for-7-hours-tp25850.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to find cause(waiting threads etc) of hanging job for 7 hours?

Reply via email to