We have ~120 executors with 5 cores each, for a very long-running job which crunches ~2.5 TB of data with has too many filters to query. Currently, we have ~30k partitions which make ~90MB per partition.
We are using Spark v2.2.2 as of now. The major problem we are facing is due to GC on the driver. All of the driver memory (30G) is getting filled and GC is very active, which is taking more than 50% of the runtime for Full GC Evacuation. The heap dump indicates that 80% of the memory is being occupied by LiveListenerBus and it's not being cleared by GC. Frequent GC runs are clearing newly created objects only. >From the Jira tickets, I got to know that Memory consumption by LiveListenerBus has been addressed in v2.3 (not sure of the specifics). But until we evaluate migrating to v2.3, is there any quick fix or workaround either to prevent various listerner events bulking up in driver's memory or to identify and disable the Listener which is causing the delay in processing events. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org