I use https://github.com/kubeflow/spark-operator rather than bitnami chart, but https://medium.com/@kayvan.sol2/spark-on-kubernetes-d566158186c6 shows running spark submit from a master pod exec. Might be something to try.
On Mon, Aug 26, 2024 at 12:22 PM karan alang <karan.al...@gmail.com> wrote: > We are currently using Dataproc on GCP for running our spark workloads, > and i'm planning to move this workload to Kubernetes(GKE). > > Here is what is done so far : > > Installed Spark using bitnami helm chart: > > ``` > > helm repo add bitnami https://charts.bitnami.com/bitnami > > helm install spark -f sparkConfig.yaml bitnami/spark -n spark > > ``` > > Also, deployed a loadbalancer, yaml used : > > ``` > > apiVersion: v1kind: Servicemetadata: > name: spark-master-lb > labels: > app: spark > component: LoadBalancerspec: > selector: > app.kubernetes.io/component: master > app.kubernetes.io/instance: spark > app.kubernetes.io/name: spark > ports: > - name: webui > port: 8080 > targetPort: 8080 > - name: master > port: 7077 > targetPort: 7077 > type: LoadBalancer > > ``` > > Spark is installed, and the pods have come up. > > When i try to do a spark-submit in cluster mode, it gives following error: > > ``` > > (base) Karans-MacBook-Pro:fromEdward-jan26 karanalang$ > $SPARK_HOME/bin/spark-submit --master spark://<EXTERNAL_IP>:7077 > --deploy-mode cluster --name spark-on-gke > local:///Users/karanalang/Documents/Technology/0.spark-on-gke/StructuredStream-on-gke.py24/08/26 > 12:03:26 WARN Utils: Your hostname, Karans-MacBook-Pro.local resolves to a > loopback address: 127.0.0.1; using 10.42.28.138 instead (on interface > en0)24/08/26 12:03:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another addressWARNING: An illegal reflective access operation has > occurredWARNING: Illegal reflective access by > org.apache.spark.unsafe.Platform > (file:/Users/karanalang/Documents/Technology/spark-3.1.3-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.3.jar) > to constructor java.nio.DirectByteBuffer(long,int)WARNING: Please consider > reporting this to the maintainers of org.apache.spark.unsafe.PlatformWARNING: > Use --illegal-access=warn to enable warnings of further illegal reflective > access operationsWARNING: All illegal access operations will be denied in a > future release > Exception in thread "main" org.apache.spark.SparkException: Cluster deploy > mode is currently not supported for python applications on standalone > clusters. > at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:968) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:273) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > ``` > > In client mode, it gives the following error : > > 24/08/26 12:06:58 ERROR SparkContext: Error initializing SparkContext. > java.lang.NullPointerException > at org.apache.spark.SparkContext.<init>(SparkContext.scala:640) > at > org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:238) > at > py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) > at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) > at py4j.GatewayConnection.run(GatewayConnection.java:238) > at java.base/java.lang.Thread.run(Thread.java:829)24/08/26 12:06:58 INFO > SparkContext: SparkContext already stopped. > > Couple of questions : > > 1. > > is using the helm chart the correct way to install Apache Spark on > GKE/k8s (Note - need to install on both GKE and On-prem kubernetes) > 2. > > How to submit pyspark jobs on Spark cluster deployed on GKE (eg. Do I > need to create a K8s deployment for each spark job ?) > > tia ! > > Here is the stackoverflow link : > > > https://stackoverflow.com/questions/78915988/unable-to-deploy-pyspark-application-on-gke-spark-installed-using-bitnami-helm > > > > > > > >