[ 
https://issues.apache.org/jira/browse/YARN-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681525#comment-16681525
 ] 

Thomas Graves commented on YARN-8991:
-------------------------------------

[~teonadi] can you clarify here.  Are you saying its not getting cleaned up 
while the Spark application is still running or its not getting cleaned up 
after the spark application finishes?

> nodemanager not cleaning blockmgr directories inside appcache 
> --------------------------------------------------------------
>
>                 Key: YARN-8991
>                 URL: https://issues.apache.org/jira/browse/YARN-8991
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Hidayat Teonadi
>            Priority: Major
>         Attachments: yarn-nm-log.txt
>
>
> Hi, I'm running spark on yarn and have enabled the Spark Shuffle Service. I'm 
> noticing that during the lifetime of my spark streaming application, the nm 
> appcache folder is building up with blockmgr directories (filled with 
> shuffle_*.data).
> Looking into the nm logs, it seems like the blockmgr directories is not part 
> of the cleanup process of the application. Eventually disk will fill up and 
> app will crash. I have both 
> {{yarn.nodemanager.localizer.cache.cleanup.interval-ms}} and 
> {{yarn.nodemanager.localizer.cache.target-size-mb}} set, so I don't think its 
> a configuration issue.
> What is stumping me is the executor ID listed by spark during the external 
> shuffle block registration doesn't match the executor ID listed in yarn's nm 
> log. Maybe this executorID disconnect explains why the cleanup is not done ? 
> I'm assuming that blockmgr directories are supposed to be cleaned up ?
>  
> {noformat}
> 2018-11-05 15:01:21,349 INFO 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: Registered 
> executor AppExecId{appId=application_1541045942679_0193, execId=1299} with 
> ExecutorShuffleInfo{localDirs=[/mnt1/yarn/nm/usercache/auction_importer/appcache/application_1541045942679_0193/blockmgr-b9703ae3-722c-47d1-a374-abf1cc954f42],
>  subDirsPerLocalDir=64, 
> shuffleManager=org.apache.spark.shuffle.sort.SortShuffleManager}
>  {noformat}
>  
> seems similar to https://issues.apache.org/jira/browse/YARN-7070, although 
> I'm not sure if the behavior I'm seeing is spark use related.
> [https://stackoverflow.com/questions/52923386/spark-streaming-job-doesnt-delete-shuffle-files]
>  has a stop gap solution of cleaning up via cron.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to