skonto opened a new pull request #25229: [SPARK-27900][K8s] Add oom jvm option URL: https://github.com/apache/spark/pull/25229 ## What changes were proposed in this pull request? Adds a flag to make the driver exit in case of an oom error in the entrypoint script. This follows the discussion here: https://github.com/apache/spark/pull/24796 ## How was this patch tested? Manually by launching SparkPi with a large number `100000000` which leads to an oom due to the large number of tasks allocated. ``` $kubectl logs spark-pi-8387506c19ad344d-driver -n spark 19/07/22 12:35:09 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.131.4.135:42944) with ID 2 19/07/22 12:35:09 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/07/22 12:35:09 INFO BlockManagerMasterEndpoint: Registering block manager 10.131.4.135:36134 with 413.9 MiB RAM, BlockManagerId(2, 10.131.4.135, 36134, None) # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 15"... ``` ``` $ kubectl get pods spark-pi-8387506c19ad344d-driver -n spark -o yaml .... **exitCode: 137** finishedAt: "2019-07-22T12:37:17Z" reason: Error startedAt: "2019-07-22T12:34:53Z" hostIP: 10.0.0.182 phase: Failed podIP: 10.129.6.134 qosClass: Burstable startTime: "2019-07-22T12:34:40Z" ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org