Hi Faiz, We find G1GC works well for some of our workloads that are Parquet-read intensive and we have been using G1GC with Spark on Java 8 already (spark.driver.extraJavaOptions and spark.executor.extraJavaOptions= “-XX:+UseG1GC”), while currently we are mostly running Spark (3.3 and higher) on Java 11. However, the best is always to refer to measurements of your specific workloads, let me know if you find something different. BTW besides the WebUI, I typically measure GC time also with a couple of custom tools: https://github.com/cerndb/spark-dashboard and https://github.com/LucaCanali/sparkMeasure A few tests of microbenchmarking Spark reading Parquet with a few different JDKs at: https://db-blog.web.cern.ch/node/192
Best, Luca From: Faiz Halde <haldef...@gmail.com> Sent: Thursday, December 7, 2023 23:25 To: user@spark.apache.org Subject: Spark on Java 17 Hello, We are planning to switch to Java 17 for Spark and were wondering if there's any obvious learnings from anybody related to JVM tuning? We've been running on Java 8 for a while now and used to use the parallel GC as that used to be a general recommendation for high throughout systems. How has the default G1GC worked out with Spark? Thanks Faiz