Harunobu Daikoku created YARN-9826:
--------------------------------------

             Summary: Blocked threads at 
EntityGroupFSTimelineStore#getCachedStore
                 Key: YARN-9826
                 URL: https://issues.apache.org/jira/browse/YARN-9826
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: timelineserver
    Affects Versions: 2.7.3
            Reporter: Harunobu Daikoku


We have observed this case several times on our production cluster where 100s 
of TimelineServer threads are blocked at the following synchronized block in 
EntityGroupFSTimelineStore#getCachedStore when our HDFS NameNode is under high 
load.
{code:java}
    synchronized (this.cachedLogs) {
      // Note that the content in the cache log storage may be stale.
      cacheItem = this.cachedLogs.get(groupId);
      if (cacheItem == null) {
        LOG.debug("Set up new cache item for id {}", groupId);
        cacheItem = new EntityCacheItem(groupId, getConfig());
        AppLogs appLogs = getAndSetAppLogs(groupId.getApplicationId());
        if (appLogs != null) {
          LOG.debug("Set applogs {} for group id {}", appLogs, groupId);
          cacheItem.setAppLogs(appLogs);
          this.cachedLogs.put(groupId, cacheItem);
        } else {
          LOG.warn("AppLogs for groupId {} is set to null!", groupId);
        }
      }
    }
{code}
One thread inside the synchronized block performs multiple fs operations 
(fs.exists) inside getAndSetAppLogs, which could block other threads when, for 
instance, the NameNode RPC queue is full.

One possible solution is to move getAndSetAppLogs outside the synchronized 
block.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to