I have done some observations on running Spark on Kubernetes (AKA k8s).
The model works on the basis of the "one-container-per-Pod" model <https://kubernetes.io/docs/concepts/workloads/pods/> meaning that for each node of the cluster you will have one node running the driver and each remaining node running one executor each. I tested this using Google Kubernetes cluster (GKE) with dynamic adjustment of the number of nodes using the following command: gcloud container clusters resize $GKE_NAME --num-nodes=$NODES --zone $ZONE --quiet Thus you can create a k8s cluster with $NODES. Once you have done your spark-submit, you can resize it to zero to save money gcloud container clusters resize spark-on-gke --num-nodes=0 --zone $ZONE --quiet Note that you cannot shutdown a GKE cluster but you can effectively reduce it to zero nodes So as a rule of thumb, if you have a 5 node k8s cluster, you will set the following parameters, where the number of executors will be one less the number of nodes NODES=5 NEXEC=$((NODES-1)) --conf spark.executor.instances=$NEXEC \ In this model increasing the number of executors above the available nodes for executors, will result in the addition of pending executors that will not be deployed with Pending status as shown below kubectl get pod -n spark NAME READY STATUS RESTARTS AGE randomdatabigquery-b40dd67c791417bf-exec-1 1/1 Running 0 65s randomdatabigquery-b40dd67c791417bf-exec-2 1/1 Running 0 65s randomdatabigquery-b40dd67c791417bf-exec-3 1/1 Running 0 65s randomdatabigquery-b40dd67c791417bf-exec-4 1/1 Running 0 65s randomdatabigquery-b40dd67c791417bf-exec-5 0/1 Pending 0 65s sparkbq-13d8857c7913e1d0-driver 1/1 Running 0 81s Spark GUI can be accessed through the following port forwarding once the driver was created.(run time) DRIVER_POD_NAME=`kubectl get pods -n spark |grep driver|awk '{print $1}'` kubectl port-forward $DRIVER_POD_NAME 4040:4040 -n $NAMESPACE & For this purpose I built a docker image with JAVA8, Spark 3.1.1 and SCALA 2.12. Data was randomly generated in PySpark and saved to Google BigQuery (GBQ) database. The choice of JAVA 8 and Spark 3.1.1 was in compatibility with GBQ. You will need to build your own docker image from my experience. Performance wise, I think it typically takes around 30 seconds to create the driver itself followed by executors so still some work to do. The major advantage that K8s for Spark offers is its flexibility of creating k8s clusters once and dynamically adjusting the number of nodes and reducing it to zero after running the job. To submit the spark job you need to use another VM box or anything that has spark binaries installed on it, For this purpose I installed Spark 3.1.1 on a VM with 2 vCPUs and 5.5 GB memory. Note that as long as you have a deployable docker image, that image can be shared by multiple k8s clusters or you can have multiple docker images to be deployed. Let me know your thoughts view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.