Your submit command

spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode cluster
--name pyspark-example --conf
spark.kubernetes.container.image=pyspark-example:0.1
--conf spark.kubernetes.file.upload.path=/myexample
src/StructuredStream-on-gke.py


pay attention to what it says


--conf spark.kubernetes.file.upload.path

That refers to your Python package on GCS storage not in the docker itself


From
https://spark.apache.org/docs/latest/running-on-kubernetes.html#dependency-management


"... The app jar file will be uploaded to the S3 and then when the driver
is launched it will be downloaded to the driver pod and will be added to
its classpath. Spark will generate a subdir under the upload path with a
random name to avoid conflicts with spark apps running in parallel. User
could manage the subdirs created according to his needs..."


In your case it is gs not s3


There is no point putting your python file in the docker image itself!


HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 15 Feb 2023 at 07:46, karan alang <karan.al...@gmail.com> wrote:

> Hi Ye,
>
> This is the error i get when i don't set the
> spark.kubernetes.file.upload.path
>
> Any ideas on how to fix this ?
>
> ```
>
> Exception in thread "main" org.apache.spark.SparkException: Please specify
> spark.kubernetes.file.upload.path property.
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:248)
>
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>
> at
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>
> at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>
> at
> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:247)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatureStep.scala:173)
>
> at scala.collection.immutable.List.foreach(List.scala:392)
>
> at
> org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:164)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>
> at
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>
> at scala.collection.immutable.List.foldLeft(List.scala:89)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>
> at
> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>
> at
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>
> at org.apache.spark.deploy.SparkSubmit.org
> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>
> at
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> ```
>
> On Tue, Feb 14, 2023 at 1:33 AM Ye Xianjin <advance...@gmail.com> wrote:
>
>> The configuration of ‘…file.upload.path’ is wrong. it means a distributed
>> fs path to store your archives/resource/jars temporarily, then distributed
>> by spark to drivers/executors.
>> For your cases, you don’t need to set this configuration.
>> Sent from my iPhone
>>
>> On Feb 14, 2023, at 5:43 AM, karan alang <karan.al...@gmail.com> wrote:
>>
>> 
>> Hello All,
>>
>> I'm trying to run a simple application on GKE (Kubernetes), and it is
>> failing:
>> Note : I have spark(bitnami spark chart) installed on GKE using helm
>> install
>>
>> Here is what is done :
>> 1. created a docker image using Dockerfile
>>
>> Dockerfile :
>> ```
>>
>> FROM python:3.7-slim
>>
>> RUN apt-get update && \
>>     apt-get install -y default-jre && \
>>     apt-get install -y openjdk-11-jre-headless && \
>>     apt-get clean
>>
>> ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64
>>
>> RUN pip install pyspark
>> RUN mkdir -p /myexample && chmod 755 /myexample
>> WORKDIR /myexample
>>
>> COPY src/StructuredStream-on-gke.py /myexample/StructuredStream-on-gke.py
>>
>> CMD ["pyspark"]
>>
>> ```
>> Simple pyspark application :
>> ```
>>
>> from pyspark.sql import SparkSession
>> spark = 
>> SparkSession.builder.appName("StructuredStreaming-on-gke").getOrCreate()
>>
>> data = [('k1', 123000), ('k2', 234000), ('k3', 456000)]
>> df = spark.createDataFrame(data, ('id', 'salary'))
>>
>> df.show(5, False)
>>
>> ```
>>
>> Spark-submit command :
>> ```
>>
>> spark-submit --master k8s://https://34.74.22.140:7077 --deploy-mode
>> cluster --name pyspark-example --conf
>> spark.kubernetes.container.image=pyspark-example:0.1 --conf
>> spark.kubernetes.file.upload.path=/myexample src/StructuredStream-on-gke.py
>> ```
>>
>> Error i get :
>> ```
>>
>> 23/02/13 13:18:27 INFO KubernetesUtils: Uploading file:
>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
>> to dest:
>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a/StructuredStream-on-gke.py...
>>
>> Exception in thread "main" org.apache.spark.SparkException: Uploading
>> file
>> /Users/karanalang/PycharmProjects/Kafka/pyspark-docker/src/StructuredStream-on-gke.py
>> failed...
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:296)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
>>
>> at
>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
>>
>> at
>> org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>>
>> at
>> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>>
>> at scala.collection.immutable.List.foldLeft(List.scala:89)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
>>
>> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2622)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
>>
>> at
>> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
>>
>> at org.apache.spark.deploy.SparkSubmit.org
>> $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>>
>> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>>
>> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>>
>> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>>
>> at
>> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
>>
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
>>
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> Caused by: org.apache.spark.SparkException: Error uploading file
>> StructuredStream-on-gke.py
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:319)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:292)
>>
>> ... 21 more
>>
>> Caused by: java.io.IOException: Mkdirs failed to create
>> /myexample/spark-upload-12228079-d652-4bf3-b907-3810d275124a
>>
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:317)
>>
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:305)
>>
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
>>
>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
>>
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:414)
>>
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
>>
>> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:2369)
>>
>> at
>> org.apache.hadoop.fs.FilterFileSystem.copyFromLocalFile(FilterFileSystem.java:368)
>>
>> at
>> org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileToHadoopCompatibleFS(KubernetesUtils.scala:316)
>>
>> ... 22 more
>> ```
>>
>> Any ideas on how to fix this & get it to work ?
>> tia !
>>
>> Pls see the stackoverflow link :
>>
>>
>> https://stackoverflow.com/questions/75441360/running-spark-application-on-gke-failing-on-spark-submit
>>
>>

Reply via email to