[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163116#comment-14163116
 ] 

Zhijie Shen commented on YARN-2582:
-----------------------------------

Thanks for the patch, Xuan! Here're some comments and thoughts.

1. This may not be sufficient. For example, "tmp_\[timestamp\].log" will become 
a false positive case here. We need to check whether the file name ends with 
TMP_FILE_SUFFIX or not.
{code}
          && !fileName.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}
{code}
      if (!thisNodeFile.getPath().getName()
        .contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}

2. I'm thinking what the better way should be to handle the case that it fails 
at fetching one log file, but not the others. Thoughts?
{code}
        AggregatedLogFormat.LogReader reader = null;
        try {
          reader =
              new AggregatedLogFormat.LogReader(getConf(),
                thisNodeFile.getPath());
          if (dumpAContainerLogs(containerId, reader, System.out) > -1) {
            foundContainerLogs = true;
          }
        } finally {
          if (reader != null) {
            reader.close();
          }
        }
{code}

3. In fact, in the case of either single container or all containers, when the 
container log is not found, it is possible that 1) the container log hasn't 
been aggregated yet or 2) the log aggregation is not enabled. The the former 
case it is also possible to have the third situation: the user are providing an 
invalid nodeId. It's good to uniform the warning message.
{code}
    if (!foundContainerLogs) {
      System.out.println("Logs for container " + containerId
          + " are not present in this log-file.");
      return -1;
    }
{code}
{code}
    if (! foundAnyLogs) {
      System.out.println("Logs not available at " + remoteAppLogDir.toString());
      System.out
          .println("Log aggregation has not completed or is not enabled.");
      return -1;
    }
{code}

4. It seems that we don't have unit tests to cover the successful path of 
dumping the container log(s) currently. I'm not sure if it is going to be a big 
addition. If it is, let's handle it separately.

There're two general issues:

1. Currently we don't have the web page on RM web app to show the aggregated 
logs. One the other hand, LRS is supposed to be always in the running state, 
such that the timeline server will not present it. Therefore, we still haven't 
a web entrance to view the aggregated logs.

2. For LRS, the latest logs are not aggregated in the HDFS, but in the local 
dir of NM. When a user says he wants to check the logs of this LRS, it doesn't 
make sense that he can just view yesterday's log, because today's is still not 
uploaded to HDFS. It's good to have a combined view of both latest log on local 
dir of NM and aggregated logs on HDFS. Thoughts?

> Log related CLI and Web UI changes for LRS
> ------------------------------------------
>
>                 Key: YARN-2582
>                 URL: https://issues.apache.org/jira/browse/YARN-2582
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>         Attachments: YARN-2582.1.patch
>
>
> After YARN-2468, we have change the log layout to support log aggregation for 
> Long Running Service. Log CLI and related Web UI should be modified 
> accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to