Hi all,

I am working in Databricks. When I submit a spark job with the -py-files 
argument, it seems the first two are read in but the third is ignored.

"--py-files",
"s3://some_path/appl_src.py",
"s3://some_path/main.py",
"s3://a_different_path/common.py",

I can see the first two acknowledged in the Log4j but not the third.

24/02/28 21:41:00 INFO Utils: Fetching s3://some_path/appl_src.py to ...
24/02/28 21:41:00 INFO Utils: Fetching s3://some_path/main.py to ...

As a result, the job fails because appl_src.py is importing from common.py but 
can't find it.

I posted to both Databricks community 
here<https://community.databricks.com/t5/data-engineering/spark-submit-not-reading-one-of-my-py-files-arguments/m-p/62361#M31953>
 and Stack Overflow 
here<https://stackoverflow.com/questions/78077822/databricks-spark-submit-getting-error-with-py-files>
 but did not get a response.

I'm aware that we could use a .zip file, so I tried zipping the first two 
arguments but then got a totally different error:

"Exception in thread "main" org.apache.spark.SparkException: Failed to get main 
class in JAR with error 'null'.  Please specify one with --class."

Basically I just want the application code in one s3 path and a "common" 
utilities package in another path. Thanks for your help.



Kind regards,
Chuck Pedro


________________________________
This message (including any attachments) may contain confidential, proprietary, 
privileged and/or private information. The information is intended to be for 
the use of the individual or entity designated above. If you are not the 
intended recipient of this message, please notify the sender immediately, and 
delete the message and any attachments. Any disclosure, reproduction, 
distribution or other use of this message or any attachments by an individual 
or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201

Reply via email to