[ https://issues.apache.org/jira/browse/YARN-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150717#comment-16150717 ]
Rohith Sharma K S commented on YARN-7147: ----------------------------------------- Thanks [~jlowe] for clarifying it. It make sense to me. I have closed it as duplicate. > ATS1.5 crash due to OOM > ----------------------- > > Key: YARN-7147 > URL: https://issues.apache.org/jira/browse/YARN-7147 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Attachments: Screen Shot - suspect-1.png, Screen Shot - suspect-2.png > > > It is observed that in production cluster, though _app-cache-size_ is set to > minimal i.e less than 5, ATS server is going down with OOM. The > _entity-group-fs-store.cache-store-class_ is configured with > MemoryTimelineStore which is by default. The heap size configured for ATS > daemon is 8GB. > This is because ATS parse the entity log file per domain and caches it. If > the domain has lot of entity information, then in memory cache store loads > all the entity information which is causing OOM. After restart, again it > caches same domain and goes OOM. > There are possible way handle it are > # threshold the number of entities loaded into in memory cache. This still > can lead to OOM if data size is huge. > # Based on the data size in the store. > We faced 1st issue where number of entities are very huge. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org