[ https://issues.apache.org/jira/browse/YARN-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903729#comment-14903729 ]
Shiwei Guo commented on YARN-4199: ---------------------------------- Sorry, I havn't noticed [YARN-3448|https://issues.apache.org/jira/browse/YARN-3448] before. I think [YARN-3448|https://issues.apache.org/jira/browse/YARN-3448] solved the problem in a better way. So I marked this issue as a duplicate to [YARN-3448|https://issues.apache.org/jira/browse/YARN-3448]. Thanks for your remind. > Minimize lock time in LeveldbTimelineStore.discardOldEntities > ------------------------------------------------------------- > > Key: YARN-4199 > URL: https://issues.apache.org/jira/browse/YARN-4199 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver, yarn > Reporter: Shiwei Guo > > In current implementation, LeveldbTimelineStore.discardOldEntities holds a > writeLock on deleteLock, which will block other put operation, which > eventually block the execution of YARN jobs(e.g. TEZ). When there is lots of > history jobs in timelinestore, the block time will be very long. In our > observation, it block all the TEZ jobs for several hours or longer. > The possible solutions are: > - Optimize leveldb configuration, so a full scan won't take long time. > - Take a snapshot of leveldb, and scan the snapshot, so we only need to hold > lock while getSnapshot. One question is that whether snapshot will take long > time or not, cause I have no experience with leveldb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)