Re: unable to deploy Pyspark application on GKE, Spark installed using bitnami helm chart

Mat Schaffer Tue, 27 Aug 2024 12:35:12 -0700

I use https://github.com/kubeflow/spark-operator rather than bitnami chart,
but https://medium.com/@kayvan.sol2/spark-on-kubernetes-d566158186c6 shows
running spark submit from a master pod exec. Might be something to try.


On Mon, Aug 26, 2024 at 12:22 PM karan alang <[email protected]> wrote:

> We are currently using Dataproc on GCP for running our spark workloads,
> and i'm planning to move this workload to Kubernetes(GKE).
>
> Here is what is done so far :
>
> Installed Spark using bitnami helm chart:
>
> ```
>
> helm repo add bitnami https://charts.bitnami.com/bitnami
>
> helm install spark -f sparkConfig.yaml bitnami/spark -n spark
>
> ```
>
> Also, deployed a loadbalancer, yaml used :
>
> ```
>
> apiVersion: v1kind: Servicemetadata:
>   name: spark-master-lb
>   labels:
>     app: spark
>     component: LoadBalancerspec:
>   selector:
>     app.kubernetes.io/component: master
>     app.kubernetes.io/instance: spark
>     app.kubernetes.io/name: spark
>   ports:
>   - name: webui
>     port: 8080
>     targetPort: 8080
>   - name: master
>     port: 7077
>     targetPort: 7077
>   type: LoadBalancer
>
> ```
>
> Spark is installed, and the pods have come up.
>
> When i try to do a spark-submit in cluster mode, it gives following error:
>
> ```
>
> (base) Karans-MacBook-Pro:fromEdward-jan26 karanalang$ 
> $SPARK_HOME/bin/spark-submit   --master spark://<EXTERNAL_IP>:7077   
> --deploy-mode cluster   --name spark-on-gke   
> local:///Users/karanalang/Documents/Technology/0.spark-on-gke/StructuredStream-on-gke.py24/08/26
>  12:03:26 WARN Utils: Your hostname, Karans-MacBook-Pro.local resolves to a 
> loopback address: 127.0.0.1; using 10.42.28.138 instead (on interface 
> en0)24/08/26 12:03:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another addressWARNING: An illegal reflective access operation has 
> occurredWARNING: Illegal reflective access by 
> org.apache.spark.unsafe.Platform 
> (file:/Users/karanalang/Documents/Technology/spark-3.1.3-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.3.jar)
>  to constructor java.nio.DirectByteBuffer(long,int)WARNING: Please consider 
> reporting this to the maintainers of org.apache.spark.unsafe.PlatformWARNING: 
> Use --illegal-access=warn to enable warnings of further illegal reflective 
> access operationsWARNING: All illegal access operations will be denied in a 
> future release
> Exception in thread "main" org.apache.spark.SparkException: Cluster deploy 
> mode is currently not supported for python applications on standalone 
> clusters.
>     at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:968)
>     at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:273)
>     at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
>     at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>     at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>     at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>     at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> ```
>
> In client mode, it gives the following error :
>
> 24/08/26 12:06:58 ERROR SparkContext: Error initializing SparkContext.
> java.lang.NullPointerException
>     at org.apache.spark.SparkContext.<init>(SparkContext.scala:640)
>     at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
>     at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>     at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>     at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>     at 
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>     at py4j.Gateway.invoke(Gateway.java:238)
>     at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
>     at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
>     at py4j.GatewayConnection.run(GatewayConnection.java:238)
>     at java.base/java.lang.Thread.run(Thread.java:829)24/08/26 12:06:58 INFO 
> SparkContext: SparkContext already stopped.
>
> Couple of questions :
>
>    1.
>
>    is using the helm chart the correct way to install Apache Spark on
>    GKE/k8s (Note - need to install on both GKE and On-prem kubernetes)
>    2.
>
>    How to submit pyspark jobs on Spark cluster deployed on GKE (eg. Do I
>    need to create a K8s deployment for each spark job ?)
>
> tia !
>
> Here is the stackoverflow link :
>
>
> https://stackoverflow.com/questions/78915988/unable-to-deploy-pyspark-application-on-gke-spark-installed-using-bitnami-helm
>
>
>
>
>
>
>
>

Re: unable to deploy Pyspark application on GKE, Spark installed using bitnami helm chart

Reply via email to