We did take heap dump from the live job. To our surprise, 85% of the memory is being occupied by `org.apache.spark.scheduler.LiveListenerBus` Here are few pictures for context
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t10309/Screenshot_2020-09-11_at_10.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/t10309/Screenshot_2020-09-11_at_10.png> Mridul Muralidharan wrote > Hi, > > 50% of driver time being spent in gc just for listenerbus sounds very > high in a 30G heap. > Did you try to take a heap dump and see what is occupying so much memory ? > > This will help us eliminate if the memory usage is due to some user > code/library holding references to large objects/graph of objects - or > memory usage is actually in listener/related code. > > Regards, > Mridul > > > On Tue, Aug 11, 2020 at 8:14 AM Teja < > saiteja.parsi@ > > wrote: > >> We have ~120 executors with 5 cores each, for a very long-running job >> which >> crunches ~2.5 TB of data with has too many filters to query. Currently, >> we >> have ~30k partitions which make ~90MB per partition. >> >> We are using Spark v2.2.2 as of now. The major problem we are facing is >> due >> to GC on the driver. All of the driver memory (30G) is getting filled and >> GC >> is very active, which is taking more than 50% of the runtime for Full GC >> Evacuation. The heap dump indicates that 80% of the memory is being >> occupied >> by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are >> clearing newly created objects only. >> >> From the Jira tickets, I got to know that Memory consumption by >> LiveListenerBus has been addressed in v2.3 (not sure of the specifics). >> But >> until we evaluate migrating to v2.3, is there any quick fix or workaround >> either to prevent various listerner events bulking up in driver's memory >> or >> to identify and disable the Listener which is causing the delay in >> processing events. >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: > user-unsubscribe@.apache >> >> -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org