subject:"\[External Email\] Re\: About \/mnt\/hdfs\/current\/BP directories"

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin

Usually job never reaches that point fails during shuffle. And storage memory and executor memory when it failed is usually low On Fri, Sep 8, 2023 at 16:49 Jack Wells wrote: > Assuming you’re not writing to HDFS in your code, Spark can spill to HDFS > if it runs out of memory on a per-executor

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Jack Wells

Assuming you’re not writing to HDFS in your code, Spark can spill to HDFS if it runs out of memory on a per-executor basis. This could happen when evaluating a cache operation like you have below or during shuffle operations in joins, etc. You might try to increase executor memory, tune shuffle

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin

Sure df = spark.read.option("basePath", some_path).parquet(*list_of_s3_file_paths()) ( df .where(SOME FILTER) .repartition(6) .cache() ) On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote: > Hi Nebi, can you share the code you’re using to read and write from S3? > > On Sep 8,