Re: Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-11-15 Thread eab...@163.com
Hi Eugene, As the logs indicate, when executing spark-submit, Spark will package and upload spark/conf to HDFS, along with uploading spark/jars. These files are uploaded to HDFS unless you specify uploading them to another OSS. To do so, you'll need to modify the configuration in hdfs-site.xml

Re: [EXTERNAL] Re: Spark-submit without access to HDFS

2023-11-15 Thread Eugene Miretsky
Hey! Thanks for the response. We are getting the error because there is no network connectivity to the data nodes - that's expected. What I am trying to find out is WHY we need access to the data nodes, and if there is a way to submit a job without it. Cheers, Eugene On Wed, Nov 15, 2023 at 7:

Re: Spark-submit without access to HDFS

2023-11-15 Thread eab...@163.com
Hi Eugene, I think you should Check if the HDFS service is running properly. From the logs, it appears that there are two datanodes in HDFS, but none of them are healthy. Please investigate the reasons why the datanodes are not functioning properly. It seems that the issue might be due t

Spark-submit without access to HDFS

2023-11-15 Thread Eugene Miretsky
Hey All, We are running Pyspark spark-submit from a client outside the cluster. The client has network connectivity only to the Yarn Master, not the HDFS Datanodes. How can we submit the jobs? The idea would be to preload all the dependencies (job code, libraries, etc) to HDFS, and just submit the

[Spark Structured Streaming] Two sink from Single stream

2023-11-15 Thread Subash Prabanantham
Hi Team, I am working on a basic streaming aggregation where I have one file stream source and two write sinks (Hudi table). The only difference is the aggregation performed is different, hence I am using the same spark session to perform both operations. (File Source) --> Agg1 -> DF1 -->

The job failed when we upgraded from spark 3.3.1 to spark3.4.1

2023-11-15 Thread Hanyu Huang
The version our job originally ran was spark 3.3.1 and Apache Iceberg to 1.2.0, But since we upgraded to spark3.4.1 and Apache Iceberg to 1.3.1, jobs started to fail frequently, We tried to upgrade only iceberg without upgrading spark, and the job did not report an error. Detailed description:

The job failed when we upgraded from spark 3.3.1 to spark3.4.1

2023-11-15 Thread Hanyu Huang
The version our job originally ran was spark 3.3.1 and Apache Iceberg to 1.2.0, But since we upgraded to spark3.4.1 and Apache Iceberg to 1.3.1, jobs started to fail frequently, We tried to upgrade only iceberg without upgrading spark, and the job did not report an error. Detailed description: