[ 
https://issues.apache.org/jira/browse/YARN-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156127#comment-14156127
 ] 

Zhijie Shen commented on YARN-2468:
-----------------------------------

bq. I would like to check how many log files we can upload this time. If the 
number is 0, we can skip this time. And this check is also happened before 
LogKey.write(), otherwise, we will write key, but without value.

I think Vinod meant that pendingUploadFiles is needed, but doesn't need to the 
member variable. getPendingLogFilesToUploadForThisContainer can return this 
collection, and pass it into LogValue.write by adding one param of it.

2. IMHO, the following code can be improved. If we use iterator, we can delete 
the unnecessary element on the fly.
{code}
      for (File file : candidates) {
        Matcher fileMatcher = filterPattern.matcher(file.getName());
        if (fileMatcher.find()) {
          filteredFiles.add(file);
        }
      }
      if (!exclusion) {
        return filteredFiles;
      } else {
        candidates.removeAll(filteredFiles);
        return candidates;
      }
{code}
This block could be:
{code}
...
while(candidatesItr.hasNext()) {
  candidate = candidatesItr.next();
  ...
  if ((not match && inclusive) || (match && exclusive)) {
    candidatesItr.remove()
  } 
}
{code}

3. [~jianhe] mentioned to me before that the following condition is not always 
true to determine an AM container. Any idea? And it seems that we don't need 
shouldUploadLogsForRunningContainer, we can re-use shouldUploadLogs and set 
wasContainerSuccessful to true. Personally, if it's not trivial to identify the 
AM container, I prefer to write a TODO comment and leave it until we implement 
the log retention API.
{code}
      if (containerId.getId() == 1) {
        return true;
      }
{code}

bq. It seems to be, let's validate this via a test-case.

Is it addressed by
{code}
    this.conf.setLong(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 3600);
{code}
Is it better to add a line of comment of the rationale behind the config?

5. Can the following code
{code}
    Set<ContainerId> finishedContainers = new HashSet<ContainerId>();
    for (ContainerId id : pendingContainerInThisCycle) {
      finishedContainers.add(id);
    }
{code}
be simplified as
{code}
 Set<ContainerId> finishedContainers = new 
HashSet<ContainerId>(pendingContainerInThisCycle);
{code}

> Log handling for LRS
> --------------------
>
>                 Key: YARN-2468
>                 URL: https://issues.apache.org/jira/browse/YARN-2468
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: log-aggregation, nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2468.1.patch, YARN-2468.2.patch, YARN-2468.3.patch, 
> YARN-2468.3.rebase.2.patch, YARN-2468.3.rebase.patch, YARN-2468.4.1.patch, 
> YARN-2468.4.patch, YARN-2468.5.1.patch, YARN-2468.5.1.patch, 
> YARN-2468.5.2.patch, YARN-2468.5.3.patch, YARN-2468.5.4.patch, 
> YARN-2468.5.patch, YARN-2468.6.1.patch, YARN-2468.6.patch, 
> YARN-2468.7.1.patch, YARN-2468.7.patch, YARN-2468.8.patch, 
> YARN-2468.9.1.patch, YARN-2468.9.patch
>
>
> Currently, when application is finished, NM will start to do the log 
> aggregation. But for Long running service applications, this is not ideal. 
> The problems we have are:
> 1) LRS applications are expected to run for a long time (weeks, months).
> 2) Currently, all the container logs (from one NM) will be written into a 
> single file. The files could become larger and larger.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to