[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961822#comment-15961822 ] Feng Yuan commented on YARN-6277: - [~haibochen], it may not use {{LocalFileSystem.NAME}} as key, by instead it use key by follow: {code} static class Cache { private final ClientFinalizer clientFinalizer = new ClientFinalizer(); private final Mapmap = new HashMap (); private final Set toAutoClose = new HashSet(); /** A variable that makes all objects in the cache unique */ private static AtomicLong unique = new AtomicLong(1); FileSystem get(URI uri, Configuration conf) throws IOException{ Key key = new Key(uri, conf); return getInternal(uri, conf, key); } {code} > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961013#comment-15961013 ] Haibo Chen commented on YARN-6277: -- [~Feng Yuan] If cached is enabled, there shouldn't be massive LocalFileSystem instances, unless I am missing something > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959044#comment-15959044 ] Feng Yuan commented on YARN-6277: - [~haibochen],our setting is default value(cache the filesystem objects), so them will be contained in a Map.So lead to memory leak. > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959038#comment-15959038 ] Feng Yuan commented on YARN-6277: - [~kshukla],YARN-4095 could solve this. But if ShuffleHandler could use the same Configuration object,i think will solve this more gracefully and more flawless. So now my choke point is why ShufflerHandler will use itself configuration when initize. > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952157#comment-15952157 ] Kuhu Shukla commented on YARN-6277: --- [~Feng Yuan], did you see this after YARN-4095 went in? Thanks! > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950316#comment-15950316 ] Haibo Chen commented on YARN-6277: -- Thanks [~Feng Yuan] for reporting the issue and working on a patch! If I follow you correctly, the AllocatorPerContext instance for NM_LOCAL_DIR is global, and because ShuffleHandler and NM are not sharing the same configuration object, ShuffleHandler does not see the change if the local directory is changed in NM, resulting in ShuffleHandler and NM thinking of different values for NM_LOCAL_DIR {code} private Context confChanged(Configuration conf) throws IOException { if (!newLocalDirs.equals(ctx.savedLocalDirs)) { ctx = new Context(); String[] dirStrings = StringUtils.getTrimmedStrings(newLocalDirs); ctx.localFS = FileSystem.getLocal(conf); ctx.savedLocalDirs = newLocalDirs; } } ... {code} The if statement will always evaluate to true if the other has executed confChanged() previously, so we have this thrashing issue? But looking at FileSystem.getLocal() implementation though, it seems like that only when caching for local file system is disabled, will massive number of LocalFileSystem instances be created. Can you confirm is that your setting? > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948417#comment-15948417 ] Feng Yuan commented on YARN-6277: - I have attach a patch,this issue is due to ShuffleHandler and Nodemanager configuration is inconformity. So if NM change the local-dir configs and ShuffleHandler would not synchronize. > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3, 2.8.1, 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > Attachments: YARN-6277.branch-2.8.001.patch > > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896662#comment-15896662 ] Feng Yuan commented on YARN-6277: - Hi [~Naganarasimha],Like this: In LocalDirAllocator#confChanged will check local-dir configuration if changed. The RLS will timing change this conf because local disk breakdown. And ShuffleHandler will also use LocalDirAllocator#confChanged because the AllocatorPerContext is singleton, but the conf of shuffleHandler is a clone version,so if the conf changed is nm the sh`s conf is still old. So if shuffleHandler invoke LocalDirAllocator#confChanged once nm will create a FileSystem.. > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6277) Nodemanager heap memory leak
[ https://issues.apache.org/jira/browse/YARN-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894451#comment-15894451 ] Naganarasimha G R commented on YARN-6277: - hi [~Feng Yuan], can you give more details about the issue , like heap dump and is this only in trunk ? > Nodemanager heap memory leak > > > Key: YARN-6277 > URL: https://issues.apache.org/jira/browse/YARN-6277 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 3.0.0-alpha2 >Reporter: Feng Yuan >Assignee: Feng Yuan > > Because LocalDirHandlerService@LocalDirAllocator`s mechanism,they will create > massive LocalFileSystem.So lead to heap leak. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org