[ 
https://issues.apache.org/jira/browse/YARN-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150553#comment-16150553
 ] 

Rohith Sharma K S commented on YARN-7147:
-----------------------------------------

Instead of in memory cache store, there are other cache store implementations 
available i.e  Leveldb or rolling_level_db. But not sure what is the impact on 
performance!
cc:/ [~jlowe] 

> ATS1.5 crash due to OOM
> -----------------------
>
>                 Key: YARN-7147
>                 URL: https://issues.apache.org/jira/browse/YARN-7147
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: timelineserver
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: Screen Shot - suspect-1.png, Screen Shot - suspect-2.png
>
>
> It is observed that in production cluster, though _app-cache-size_ is set to 
> minimal i.e less than 5, ATS server is going down with OOM. The 
> _entity-group-fs-store.cache-store-class_ is configured with 
> MemoryTimelineStore which is by default. The heap size configured for ATS 
> daemon is 8GB. 
> This is because ATS parse the entity log file per domain and caches it. If 
> the domain has lot of entity information, then in memory cache store loads 
> all the entity information which is causing OOM. After restart, again it 
> caches same domain and goes OOM. 
> There are  possible way handle it are
> # threshold the number of entities loaded into in memory cache. This still 
> can lead to OOM if data size is huge. 
> # Based on the data size in the store. 
> We faced 1st issue where number of entities are very huge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to