Spark documentation refers to spark-sumit --files as --files FILES: Comma-separated list of files to be placed in the working directory of each executor.
OK I have implemented this one for Kubernetes as per Spark doc <https://spark.apache.org/docs/latest/running-on-kubernetes.html> as follows: (shown in blue) export VOLUME_TYPE=hostPath export VOLUME_NAME=minikube-mount export SOURCE_DIR=/d4T/hduser/minikube export MOUNT_PATH=$SOURCE_DIR/mnt spark-submit --verbose \ --master k8s://$K8S_SERVER \ --deploy-mode cluster \ --name pytest \ --py-files hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/DSBQ.zip,hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/dependencies_short.zip \ --files config.yml \ --conf spark.kubernetes.namespace=spark \ --conf spark.executor.instances=2 \ --conf spark.kubernetes.driver.limit.cores=1 \ --conf spark.executor.cores=1 \ --conf spark.executor.memory=500m \ --conf spark.kubernetes.container.image=${IMAGE} \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-serviceaccount \ --conf spark.kubernetes.file.upload.path=$SOURCE_DIR \ --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \ --conf spark.kubernetes.driver.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \ --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.mount.path=$MOUNT_PATH \ --conf spark.kubernetes.executor.volumes.$VOLUME_TYPE.$VOLUME_NAME.options.path=$MOUNT_PATH \ hdfs://$HDFS_HOST:$HDFS_PORT/minikube/codes/${APPLICATION} What does it do? You put a file in (in my case a file called config.yml) and put in $SOURCE_DIR on your driver . You tell spark-submit to pick that file up --files config.yml and put it in every executor directory. My $APPLICATION file called testpackages.py has this code: import sys import os import pkgutil import pkg_resources import yaml import pyspark from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark.sql import HiveContext from pyspark import SparkFiles from pyspark import SparkConf, SparkContext def main(): spark = SparkSession.builder \ .enableHiveSupport() \ .getOrCreate() sc = SparkContext.getOrCreate() sc.setLogLevel("ERROR") # check os path from os import listdir from os.path import isfile, join dirpath="/d4T/hduser/minikube" onlyfiles = [f for f in listdir(dirpath) if isfile(join(dirpath, f))] print(onlyfiles) print("==> End looking at loaded files") When is run it finds both files (see in red below created for each executor) but claims it cannot create SparkContext From DRIVER_POD_NAME=`kubectl get pods -n spark |grep driver|awk '{print $1}'` kubectl logs $DRIVER_POD_NAME -n spark we can see the problem 2021-07-24 10:26:26,106 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://pytest-3bbfa67ad80d2534-driver-svc.spark.svc:4040 2021-07-24 10:26:26,118 ERROR spark.SparkContext: Error initializing SparkContext. java.io.FileNotFoundException: File file:/d4T/hduser/minikube/spark-upload-065d87cf-a1ee-4448-8199-5ec018aacfde/config.yml does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1604) at org.apache.spark.SparkContext.$anonfun$new$13(SparkContext.scala:508) at org.apache.spark.SparkContext.$anonfun$new$13$adapted(SparkContext.scala:508) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.SparkContext.<init>(SparkContext.scala:508) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Unknown Source) 2021-07-24 10:26:26,125 INFO server.AbstractConnector: Stopped Spark@694ea73d{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 2021-07-24 10:26:26,126 INFO ui.SparkUI: Stopped Spark web UI at http://pytest-3bbfa67ad80d2534-driver-svc.spark.svc:4040 2021-07-24 10:26:26,142 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 2021-07-24 10:26:26,151 INFO memory.MemoryStore: MemoryStore cleared 2021-07-24 10:26:26,151 INFO storage.BlockManager: BlockManager stopped 2021-07-24 10:26:26,157 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 2021-07-24 10:26:26,157 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running 2021-07-24 10:26:26,159 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 2021-07-24 10:26:26,207 INFO spark.SparkContext: Successfully stopped SparkContext Traceback (most recent call last): File "/tmp/spark-2041787b-aee8-4bbd-a8d1-e2cc0339665e/testpackages.py", line 74, in <module> main() File "/tmp/spark-2041787b-aee8-4bbd-a8d1-e2cc0339665e/testpackages.py", line 15, in main spark = SparkSession.builder \ File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 228, in getOrCreate File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 384, in getOrCreate File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 147, in __init__ File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 209, in _do_init File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 321, in _initialize_context File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1569, in __call__ File "/opt/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.io.FileNotFoundException: File file:/d4T/hduser/minikube/spark-upload-065d87cf-a1ee-4448-8199-5ec018aacfde/config.yml does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454) at org.apache.spark.SparkContext.addFile(SparkContext.scala:1604) at org.apache.spark.SparkContext.$anonfun$new$13(SparkContext.scala:508) at org.apache.spark.SparkContext.$anonfun$new$13$adapted(SparkContext.scala:508) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.SparkContext.<init>(SparkContext.scala:508) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.base/java.lang.reflect.Constructor.newInstance(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.base/java.lang.Thread.run(Unknown Source) However, that file in red is created outside on host on mount directory $SOURCE_DIR = /d4T/hduser/minikube ls -l /d4T/hduser/minikube/ total 20 drwxr-xr-x. 12 hduser hadoop 4096 Jul 24 10:09 .. -rw-r--r--. 1 hduser hadoop 4433 Jul 24 10:12 config.yml drwxr-xr-x. 3 hduser hadoop 4096 Jul 24 11:26 . drwxr-xr-x. 2 hduser hadoop 4096 Jul 24 11:26 spark-upload-065d87cf-a1ee-4448-8199-5ec018aacfde config.yml is the one i put there and if we look under spark-upload-065d87cf-a1ee-4448-8199-5ec018aacfde, we see the copy ls -l /d4T/hduser/minikube/spark-upload-065d87cf-a1ee-4448-8199-5ec018aacfde total 16 -rw-r--r--. 1 hduser hadoop 4433 Jul 24 11:26 config.yml Sound like a bug view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.