I found some of the notes on Volcano and my tests back in Feb 2022. I did my volcano tests on Spark 3.1.1. The results were not very great then. Hence I asked in thread from @santosh, if any updated comparisons are available. I will try the test with Spark 3.4.1 at some point. Maybe some users have done some tests on Volcano with newer versions of Spark that they care to share?
Thanks Forwarded Conversation Subject: Recap on current status of "SPIP: Support Customized Kubernetes Schedulers", ------------------------ ---------- From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Thu, 24 Feb 2022 at 09:16 To: Yikun Jiang <yikunk...@gmail.com> Cc: dev <d...@spark.apache.org>, Dongjoon Hyun <dongj...@apache.org>, Holden Karau <hol...@pigscanfly.ca>, William Wang <wang.platf...@gmail.com>, Attila Zsolt Piros <piros.attila.zs...@gmail.com>, Hyukjin Kwon < gurwls...@gmail.com>, <mgrigo...@apache.org>, Weiwei Yang <w...@apache.org>, Thomas Graves <tgraves...@gmail.com> Hi, what do expect the performance gain to be by using volcano versus standard scheduler. Just to be sure there are two aspects here. 1. Procuring the Kubernetes cluster 2. Running the job through spark-submit Item 1 is left untouched and we should see improvements in item 2 with Volcano Thanks view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. ---------- From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Thu, 24 Feb 2022 at 23:35 To: Yikun Jiang <yikunk...@gmail.com> Cc: dev <d...@spark.apache.org>, Dongjoon Hyun <dongj...@apache.org>, Holden Karau <hol...@pigscanfly.ca>, William Wang <wang.platf...@gmail.com>, Attila Zsolt Piros <piros.attila.zs...@gmail.com>, Hyukjin Kwon < gurwls...@gmail.com>, <mgrigo...@apache.org>, Weiwei Yang <w...@apache.org>, Thomas Graves <tgraves...@gmail.com> I did some preliminary tests without volcana and with volcano addition to spark-submit. *setup* The K8s cluster used was a Google Kubernetes standard cluster with three nodes with autoscale up to 6 nodes. It runs *spark 3.1.1* with spark-py dockers also using *spark 3.1.1 with Java 8*. In every run, it creates a million rows of random data and inserts them from Spark DF into Google BigQuery database. The choice of Spark 3.1.1 and Java 8 was for compatibility for Spark API and the BigQuery. To keep the systematics the same I used the same cluster with the only difference being the spark-submit additional lines as below for volcano NEXEC=2 MEMORY="8192m" VCORES=3 FEATURES=”org.apache.spark.deploy.k8s.features.VolcanoFeatureStep” gcloud config set compute/zone $ZONE export PROJECT=$(gcloud info --format='value(config.project)') gcloud container clusters get-credentials ${CLUSTER_NAME} --zone $ZONE export KUBERNETES_MASTER_IP=$(gcloud container clusters list --filter name=${CLUSTER_NAME} --format='value(MASTER_IP)') spark-submit --verbose \ --properties-file ${property_file} \ --master k8s://https://$KUBERNETES_MASTER_IP:443 \ --deploy-mode cluster \ --name sparkBQ \ * --conf spark.kubernetes.scheduler=volcano \* * --conf spark.kubernetes.driver.pod.featureSteps=$FEATURES \* * --conf spark.kubernetes.executor.pod.featureSteps=$FEATURES \* * --conf spark.kubernetes.job.queue=queue1 \* --py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \ --conf spark.kubernetes.namespace=$NAMESPACE \ --conf spark.executor.instances=$NEXEC \ --conf spark.driver.cores=$VCORES \ --conf spark.executor.cores=$VCORES \ --conf spark.driver.memory=$MEMORY \ --conf spark.executor.memory=$MEMORY \ --conf spark.network.timeout=300 \ --conf spark.kubernetes.allocation.batch.size=3 \ --conf spark.kubernetes.allocation.batch.delay=1 \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.dynamicAllocation.shuffleTracking.enabled=true \ --conf spark.kubernetes.driver.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.executor.container.image=${IMAGEDRIVER} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-bq \ --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" \ --conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/ kubernetes.io/serviceaccount/ca.crt \ --conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/ kubernetes.io/serviceaccount/token \ $CODE_DIRECTORY_CLOUD/${APPLICATION} In contrast the standard spark-submit does not have those 4 volcano specific lines (in bald). This i the output from *spark-submit --verbose* Spark properties used, including those specified through --conf and those from the properties file /home/hduser/dba/bin/python/spark_on_gke/deployment/src/scripts/properties: (spark.kubernetes.executor.secrets.spark-sa,*********(redacted)) (spark.dynamicAllocation.shuffleTracking.enabled,true) (spark.kubernetes.allocation.batch.delay,1) (spark.kubernetes.driverEnv.GOOGLE_APPLICATION_CREDENTIALS,*********(redacted)) *(spark.kubernetes.executor.pod.featureSteps,”org.apache.spark.deploy.k8s.features.VolcanoFeatureStep”)* (spark.driver.memory,8192m) (spark.network.timeout,300) (spark.executor.memory,8192m) (spark.executor.instances,2) (spark.hadoop.fs.gs.project.id,xxx) (spark.kubernetes.allocation.batch.size,3) (spark.hadoop.google.cloud.auth.service.account.json.keyfile,*********(redacted)) *(spark.kubernetes.scheduler,volcano)* (spark.kubernetes.namespace,spark) (spark.kubernetes.authenticate.driver.serviceAccountName,spark-bq) (spark.kubernetes.executor.container.image, eu.gcr.io/xxx/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-java8PlusPackages) (spark.driver.cores,3) (spark.kubernetes.driverEnv.GCS_PROJECT_ID,xxx) (spark.executor.extraJavaOptions,-Dio.netty.tryReflectionSetAccessible=true) (spark.executorEnv.GCS_PROJECT_ID,xxx) (spark.hadoop.google.cloud.auth.service.account.enable,true) (spark.driver.extraJavaOptions,-Dio.netty.tryReflectionSetAccessible=true) *(spark.kubernetes.job.queue,queue1)* (spark.kubernetes.authenticate.caCertFile,*********(redacted)) (spark.kubernetes.driver.secrets.spark-sa,*********(redacted)) (spark.executorEnv.GOOGLE_APPLICATION_CREDENTIALS,*********(redacted)) (spark.kubernetes.authenticate.oauthTokenFile,*********(redacted)) (spark.dynamicAllocation.enabled,true) (spark.kubernetes.driver.container.image, eu.gcr.io/xxx/spark-py:3.1.1-scala_2.12-8-jre-slim-buster-java8PlusPackages) *(spark.kubernetes.driver.pod.featureSteps,”org.apache.spark.deploy.k8s.features.VolcanoFeatureStep”)* (spark.executor.cores,3) So I ran the spark-submit job 8 times foreach configuration in sequence; namely the *standard setup followed by the volcano setup*. The timings were measured from the python code at the start and the end excluding the cluster creation times, simply: start_time = time.time() ..code end_time = time.time() time_elapsed = (end_time - start_time) print(f"""Elapsed time in seconds is {time_elapsed}""") *Results* These are the results of each run and timings in seconds [image: image.png] At this first instance the volcano compares poorly (avg ~96 seconds compared to ~ 92 seconds). So with the deviations, I need to investigate the reasons for the larger fluctuations of time taken for the volcano runs. Any comments are welcome. HTH ---------- From: Yikun Jiang <yikunk...@gmail.com> Date: Fri, 25 Feb 2022 at 02:51 To: dev <d...@spark.apache.org> Cc: Dongjoon Hyun <dongj...@apache.org>, Holden Karau <hol...@pigscanfly.ca>, William Wang <wang.platf...@gmail.com>, <piros.attila.zs...@gmail.com>, Hyukjin Kwon <gurwls...@gmail.com>, <mgrigo...@apache.org>, Weiwei Yang < w...@apache.org>, Thomas Graves <tgraves...@gmail.com> @dongjoon-hyun @yangwwei Thanks! @Mich Thanks for testing it, I'm not very professional with GKE, I'm also not quite sure if it is different in configurations, internal network, scheduler implementations itself VS upstream K8S. As far as I know, different K8S vendors also maintain their own optimizations in their downstream product. But you can see some basic integration test results based on upstream K8S on x86/arm64: - x86: https://github.com/apache/spark/pull/35422#issuecomment-1035901775 - Arm64: https://github.com/apache/spark/pull/35422#issuecomment-1037039764 As can be seen from the results, for a single job, there is no big difference between default scheduler and volcano. Also custom schedulers such as Volcano, Yunikorn are more for the overall situation for multiple jobs and the utilization of the entire K8S cluster. ---------- From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Fri, 25 Feb 2022 at 09:50 To: Yikun Jiang <yikunk...@gmail.com> Cc: dev <d...@spark.apache.org>, Dongjoon Hyun <dongj...@apache.org>, Holden Karau <hol...@pigscanfly.ca>, William Wang <wang.platf...@gmail.com>, Attila Zsolt Piros <piros.attila.zs...@gmail.com>, Hyukjin Kwon < gurwls...@gmail.com>, <mgrigo...@apache.org>, Weiwei Yang <w...@apache.org>, Thomas Graves <tgraves...@gmail.com> Hi Yikun, GKE <https://cloud.google.com/kubernetes-engine> is Google's Kubernetes engine first in the market and pretty stable. The cluster deployed is a 3 node GKE with 4 Vcores and 16GB of RAM each. Autoscaling is on to take nodes from 3 to 6. So it is pretty robust. I did 15 sequences of tests with the following results [image: image.png] Now again the readings from the standard spark-submit are pretty stable with the standard deviation of 3., compared to volcano with the standard deviation of 13.6. What is the latency for FEATURES=”org.apache.spark.deploy.k8s.features.VolcanoFeatureStep”, could that be one reason? Regards, Mich ---------- From: Mich Talebzadeh <mich.talebza...@gmail.com> Date: Fri, 25 Feb 2022 at 11:38 To: Yikun Jiang <yikunk...@gmail.com> Cc: dev <d...@spark.apache.org>, Dongjoon Hyun <dongj...@apache.org>, Holden Karau <hol...@pigscanfly.ca>, William Wang <wang.platf...@gmail.com>, Attila Zsolt Piros <piros.attila.zs...@gmail.com>, Hyukjin Kwon < gurwls...@gmail.com>, <mgrigo...@apache.org>, Weiwei Yang <w...@apache.org>, Thomas Graves <tgraves...@gmail.com> spreadsheet with the actual size [image: 26243d69-ac3a-43f1-b2cc-903f9744237b.png] Also the spec for GKE cluster build gcloud beta container \ --project "xxx" clusters create "spark-on-gke" \ --zone "europe-west2-c" \ --no-enable-basic-auth \ --cluster-version "1.21.6-gke.1500" \ --release-channel "regular" \ --machine-type "e2-standard-4" \ --image-type "COS_CONTAINERD" \ --disk-type "pd-standard" --disk-size "100" \ --metadata disable-legacy-endpoints=true \ --scopes "https://www.googleapis.com/auth/devstorage.read_only", \ "https://www.googleapis.com/auth/logging.write", \ "https://www.googleapis.com/auth/monitoring",\ "https://www.googleapis.com/auth/servicecontrol",\ "https://www.googleapis.com/auth/service.management.readonly",\ "https://www.googleapis.com/auth/trace.append" \ --max-pods-per-node "110" \ --num-nodes "3" \ --logging=SYSTEM,WORKLOAD \ --monitoring=SYSTEM \ --enable-ip-alias \ --network "projects/xxx/global/networks/default" \ --subnetwork "projects/xxx/regions/europe-west2/subnetworks/default" \ --no-enable-intra-node-visibility \ --default-max-pods-per-node "110" \ --enable-autoscaling \ --min-nodes "3" \ --max-nodes "6" \ --no-enable-master-authorized-networks \ --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ --enable-autoupgrade \ --enable-autorepair \ --max-surge-upgrade 1 \ --max-unavailable-upgrade 0 \ --enable-shielded-nodes \ --node-locations "europe-west2-c"