Hi Nebi, can you share the code you’re using to read and write from S3? On Sep 8, 2023 at 10:59:59, Nebi Aydin <nayd...@binghamton.edu.invalid> wrote:
> Hi all, > I am using spark on EMR to process data. Basically i read data from AWS S3 > and do the transformation and post transformation i am loading/writing data > to s3. > > Recently we have found that hdfs(/mnt/hdfs) utilization is going too high. > > I disabled `yarn.log-aggregation-enable` by setting it to False. > > I am not writing any data to hdfs(/mnt/hdfs) however is that spark is > creating blocks and writing data into it. We are going all the operations > in memory. > > Any specific operation writing data to datanode(HDFS)? > > Here is the hdfs dirs created. > > ``` > > *15.4G > /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized/subdir1 > > 129G > /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current/finalized > > 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812/current > > 129G /mnt/hdfs/current/BP-6706123673-10.xx.xx.xxx-1588026945812 > > 129G /mnt/hdfs/current 129G /mnt/hdfs* > > ``` > > > <https://stackoverflow.com/collectives/aws> >