Hi Nebi, can you share the code you’re using to read and write from S3?

On Sep 8, 2023 at 10:59:59, Nebi Aydin <nayd...@binghamton.edu.invalid>
wrote:

> Hi all,
> I am using spark on EMR to process data. Basically i read data from AWS S3
> and do the transformation and post transformation i am loading/writing data
> to s3.
>
> Recently we have found that hdfs(/mnt/hdfs) utilization is going too high.
>
> I disabled `yarn.log-aggregation-enable` by setting it to False.
>
> I am not writing any data to hdfs(/mnt/hdfs) however is that spark is
> creating blocks and writing data into it. We are going all the operations
> in memory.
>
> Any specific operation writing data to datanode(HDFS)?
>
> Here is the hdfs dirs created.
>
> ```
>
> *15.4G
> /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized/subdir1
>
> 129G
> /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized
>
> 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current
>
> 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812
>
> 129G /mnt/hdfs/current 129G /mnt/hdfs*
>
> ```
>
>
> <https://stackoverflow.com/collectives/aws>
>

Reply via email to