Hi, I have a basic question to ask.
I am running a Google k8s cluster (AKA GKE) with three nodes each having configuration below e2-standard-2 (2 vCPUs, 8 GB memory) spark-submit is launched from another node (actually a data proc single node that I have just upgraded to e2-custom (4 vCPUs, 8 GB mem). We call this the launch node OK I know that the cluster is not much but Google was complaining about the launch node hitting 100% cpus. So I added two more cpus to it. It appears that despite using k8s as the computational cluster, the burden falls upon the launch node! The cpu utilisation for launch node shown below [image: image.png] The dip is when 2 more cpus were added to it so it had to reboot. so around %70 usage The combined cpu usage for GKE nodes is shown below: [image: image.png] Never goes above 20%! I can see that the drive and executors as below: k get pods -n spark NAME READY STATUS RESTARTS AGE pytest-c958c97b2c52b6ed-driver 1/1 Running 0 69s randomdatabigquery-e68a8a7b2c52f468-exec-1 1/1 Running 0 51s randomdatabigquery-e68a8a7b2c52f468-exec-2 1/1 Running 0 51s randomdatabigquery-e68a8a7b2c52f468-exec-3 0/1 Pending 0 51s It is a PySpark 3.1.1 image using java 8 and pushing random data generated into Google BigQuery data warehouse. The last executor (exec-3) seems to be just pending. The spark-submit is as below: spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name pytest \ --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./pyspark_venv/bin/python \ --py-files $CODE_DIRECTORY/DSBQ.zip \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.executor.memory=5000m \ --conf spark.network.timeout=300 \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.driver.cores=1 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=2000m \ --conf spark.kubernetes.driver.docker.image=${IMAGEGCP} \ --conf spark.kubernetes.executor.docker.image=${IMAGEGCP} \ --conf spark.kubernetes.container.image=${IMAGEGCP} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.sql.execution.arrow.pyspark.enabled="true" \ $CODE_DIRECTORY/${APPLICATION} Aren't the driver and executors running on K8s cluster? So why is the launch node heavily used but k8s cluster is underutilized? Thanks *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.