Usually job never reaches that point fails during shuffle. And storage memory and executor memory when it failed is usually low On Fri, Sep 8, 2023 at 16:49 Jack Wells <j...@tecton.ai.invalid> wrote:
> Assuming you’re not writing to HDFS in your code, Spark can spill to HDFS > if it runs out of memory on a per-executor basis. This could happen when > evaluating a cache operation like you have below or during shuffle > operations in joins, etc. You might try to increase executor memory, tune > shuffle operations, avoid caching, or reduce the size of your dataframe(s). > > Jack > > On Sep 8, 2023 at 12:43:07, Nebi Aydin <nayd...@binghamton.edu.invalid> > wrote: > >> >> Sure >> df = spark.read.option("basePath", >> some_path).parquet(*list_of_s3_file_paths()) >> ( >> df >> .where(SOME FILTER) >> .repartition(60000) >> .cache() >> ) >> >> On Fri, Sep 8, 2023 at 14:56 Jack Wells <j...@tecton.ai.invalid> wrote: >> >>> Hi Nebi, can you share the code you’re using to read and write from S3? >>> >>> On Sep 8, 2023 at 10:59:59, Nebi Aydin <nayd...@binghamton.edu.invalid> >>> wrote: >>> >>>> Hi all, >>>> I am using spark on EMR to process data. Basically i read data from AWS >>>> S3 and do the transformation and post transformation i am loading/writing >>>> data to s3. >>>> >>>> Recently we have found that hdfs(/mnt/hdfs) utilization is going too >>>> high. >>>> >>>> I disabled `yarn.log-aggregation-enable` by setting it to False. >>>> >>>> I am not writing any data to hdfs(/mnt/hdfs) however is that spark is >>>> creating blocks and writing data into it. We are going all the operations >>>> in memory. >>>> >>>> Any specific operation writing data to datanode(HDFS)? >>>> >>>> Here is the hdfs dirs created. >>>> >>>> ``` >>>> >>>> *15.4G >>>> /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized/subdir1 >>>> >>>> 129G >>>> /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized >>>> >>>> 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current >>>> >>>> 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812 >>>> >>>> 129G /mnt/hdfs/current 129G /mnt/hdfs* >>>> >>>> ``` >>>> >>>> >>>> <https://stackoverflow.com/collectives/aws> >>>> >>>