Hi all,
I am using spark on EMR to process data. Basically i read data from AWS S3
and do the transformation and post transformation i am loading/writing data
to s3.

Recently we have found that hdfs(/mnt/hdfs) utilization is going too high.

I disabled `yarn.log-aggregation-enable` by setting it to False.

I am not writing any data to hdfs(/mnt/hdfs) however is that spark is
creating blocks and writing data into it. We are going all the operations
in memory.

Any specific operation writing data to datanode(HDFS)?

Here is the hdfs dirs created.

```

*15.4G
/mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized/subdir1

129G
/mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized

129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current

129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812

129G /mnt/hdfs/current 129G /mnt/hdfs*

```


<https://stackoverflow.com/collectives/aws>

Reply via email to