Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
Usually job never reaches that point fails during shuffle. And storage memory and executor memory when it failed is usually low On Fri, Sep 8, 2023 at 16:49 Jack Wells wrote: > Assuming you’re not writing to HDFS in your code, Spark can spill to HDFS > if it runs out of memory on a per-executor

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells
Assuming you’re not writing to HDFS in your code, Spark can spill to HDFS if it runs out of memory on a per-executor basis. This could happen when evaluating a cache operation like you have below or during shuffle operations in joins, etc. You might try to increase executor memory, tune shuffle

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
Sure df = spark.read.option("basePath", some_path).parquet(*list_of_s3_file_paths()) ( df .where(SOME FILTER) .repartition(6) .cache() ) On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote: > Hi Nebi, can you share the code you’re using to read and write from S3? > > On Sep 8,

Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells
Hi Nebi, can you share the code you’re using to read and write from S3? On Sep 8, 2023 at 10:59:59, Nebi Aydin wrote: > Hi all, > I am using spark on EMR to process data. Basically i read data from AWS S3 > and do the transformation and post transformation i am loading/writing data > to s3. >

About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
Hi all, I am using spark on EMR to process data. Basically i read data from AWS S3 and do the transformation and post transformation i am loading/writing data to s3. Recently we have found that hdfs(/mnt/hdfs) utilization is going too high. I disabled `yarn.log-aggregation-enable` by setting it

RE: Spark 3.4.1 and Hive 3.1.3

2023-09-08 Thread Agrawal, Sanket
Hi Yasukazu, I tried by replacing the jar though the spark code didn’t work but the vulnerability was removed. But I agree that even 3.1.3 has other vulnerabilities listed on maven page but these are medium level vulnerabilities. We are currently targeting Critical and High vulnerabilities

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
@Alfie Davidson : Awesome, it worked with "“org.elasticsearch.spark.sql”" But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also worked. On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev wrote: > > Let me try that and get back. Just wondering, if there a change in the > way we pass

Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Dipayan Dev
Let me try that and get back. Just wondering, if there a change in the way we pass the format in connector from Spark 2 to 3? On Fri, 8 Sep 2023 at 12:35 PM, Alfie Davidson wrote: > I am pretty certain you need to change the write.format from “es” to > “org.elasticsearch.spark.sql” > > Sent