Hey Mich,
Thanks for the detailed response. I get most of these options.
However, what we are trying to do is avoid having to upload the source
configs and pyspark.zip files to the cluster every time we execute the job
using spark-submit. Here is the code that does it:
https://github.com/apache/s
Hi Eugene,
With regard to your points
What are the PYTHONPATH and SPARK_HOME env variables in your script?
OK let us look at a typical of my Spark project structure
- project_root
|-- README.md
|-- __init__.py
|-- conf
| |-- (configuration files for Spark)
|-- deployment
| |-- d
Setting PYSPARK_ARCHIVES_PATH to hfds:// did the tricky. But don't
understand a few things
1) The default behaviour is if PYSPARK_ARCHIVES_PATH is empty, pyspark.zip
is uploaded from the local SPARK_HOME. If it is set to "local://" the
upload is skipped. I would expect the latter to be the default
Thanks Mich,
Tried this and still getting
INF Client: "Uploading resource
file:/opt/spark/spark-3.5.0-bin-hadoop3/python/lib/pyspark.zip ->
hdfs:/". It is also doing it for (py4j.-0.10.9.7-src.zip and
__spark_conf__.zip). It is working now because I enabled direct
access to HDFS to allow copying t
hdfs-site.xml, for instance,
fs.oss.impl, etc.
eabour
From: Eugene Miretsky
Date: 2023-11-16 09:58
To: eab...@163.com
CC: Eugene Miretsky; user @spark
Subject: Re: [EXTERNAL] Re: Spark-submit without access to HDFS
Hey!
Thanks for the response.
We are getting the error because there is no ne
Hey!
Thanks for the response.
We are getting the error because there is no network connectivity to the
data nodes - that's expected.
What I am trying to find out is WHY we need access to the data nodes, and
if there is a way to submit a job without it.
Cheers,
Eugene
On Wed, Nov 15, 2023 at 7: